Search Results: "pb"

22 May 2023

Adnan Hodzic: rpi-microk8s-bootstrap: Automate RPI device conversion into Kubernetes cluster nodes with Terraform

Considering I ve created my own private cloud in my home as part of: wp-k8s: WordPress on privately hosted Kubernetes cluster (Raspberry Pi 4 + Synology).... The post rpi-microk8s-bootstrap: Automate RPI device conversion into Kubernetes cluster nodes with Terraform appeared first on FoolControl: Phear the penguin.

12 May 2023

Dirk Eddelbuettel: crc32c 0.0.2 on CRAN: Build Fixes

A first follow-up to the initial announcement just days ago of the new crc32c package. The package offers cyclical checksum with parity in hardware-accelerated form on (recent enough) intel cpus as well as on arm64. This follow-up was needed because I missed, when switching to a default static library build, that newest compilers would complain if -fPIC was not set. gcc-12 on my box was happy, gcc-13 on recent Fedora as used at CRAN was not. A second error was assuming that saying SystemRequirements: cmake would suffice. But hold on whippersnapper: macOS always has a surprise for you! As described at the end of the appropriate section in Writing R Extensions, on that OS you have to go the basement, open four cupboards, rearrange three shelves and then you get to use it. And then in doing so (from an added configure script) I failed to realize Windows needed a fallback. Gee. The NEWS entry for this (as well the initial release) follows.

Changes in version 0.0.2 (2023-05-11)
  • Explicitly set cmake property for position-independent code
  • Help macOS find its cmake binary as detailed also in WRE
  • Help Windows with a non-conditional Makevars.win pointing at cmake
  • Add more badges to README.md

Changes in version 0.0.1 (2023-05-07)
  • Initial release version and CRAN upload

If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

9 May 2023

C.J. Collier: Instructions for installing Proxmox onto the Qotom device

These instructions are for qotom devices Q515P and Q1075GE. You can order one from Amazon or directly from Cherry Ni <export03@qotom.com>. Instructions are for those coming from Windows. Prerequisites: To find your windows network details, run the following command at the command prompt:
netsh interface ip show addresses
Here s my output:
PS C:\Users\cjcol> netsh interface ip show addresses "Wi-Fi"
Configuration for interface "Wi-Fi"
    DHCP enabled:                         Yes
    IP Address:                           172.16.79.53
    Subnet Prefix:                        172.16.79.0/24 (mask 255.255.255.0)
    Default Gateway:                      172.16.79.1
    Gateway Metric:                       0
    InterfaceMetric:                      50
Did you follow the instructions linked above in the prerequisites section? If not, take a moment to do so now.
Open Rufus and select the proxmox iso which you downloaded. You may be warned that Rufus will be acting as dd.
Don t forget to select the USB drive that you want to write the image to. In my example, the device is creatively called NO_LABEL .
You may be warned that re-imaging the USB disk will result in the previous data on the USB disk being lost.
Once the process is complete, the application will indicate that it is complete.
You should now have a USB disk with the Proxmox installer image on it. Place the USB disk into one of the blue, USB-3.0, USB-A slots on the Qotom device so that the system can read the installer image from it at full speed. The Proxmox installer requires a keyboard, video and mouse. Please attach these to the device along with inserting the USB disk you just created. Press the power button on the Qotom device. Press the F11 key repeatedly until you see the AMI BIOS menu. Press F11 a couple more times. You ll be presented with a boot menu. One of the options will launch the Proxmox installer. By trial and error, I found that the correct boot menu option was UEFI OS Once you select the correct option, you will be presented with a menu that looks like this. Select the default option and install. During the install, you will be presented with an option of the block device to install to. I think there s only a single block device in this celeron, but if there are more than one, I prefer the smaller one for the ProxMox OS. I also make a point to limit the size of the root filesystem to 16G. I think it will take up the entire volume group if you don t set a limit. Okay, I ll do another install and select the correct filesystem. If you read this far and want me to add some more screenshots and better instructions, leave a comment.

4 May 2023

Emanuele Rocca: UEFI Secure Boot on the Raspberry Pi

UPDATE: this post unexpectedly ended up on Hacker News and I received a lot of comments. The two most important points being made are (1) that Secure Boot on the RPi as described here is not actually truly secure. An attacker who successfully gained root could just mount the firmware partition and either add their own keys to the EFI variable store or replace the firmware altogether with a malicious one. (2) The TianCore firmware cannot be used instead of the proprietary blob as I mentioned. What truly happens is that the proprietary blob is loaded onto the VideoCore cores, then TianoCore is loaded onto the ARM cores. Thanks for the corrections.

A port of the free software TianoCore UEFI firmware can be used instead of the proprietary boot blob to boot the Raspberry Pi. This allows to install Debian on the RPi with the standard Debian Installer, and it also makes it possible to use UEFI Secure Boot. Note that Secure Boot had been broken on arm64 for a while, but it s now working in Bookworm!.

Debian Installer UEFI boot
To begin, you ll need to download the appropriate firmware files for the RPi3 or RPi4. I ve got a Raspberry Pi 3 Model B+ myself, so the rest of this document will assume an RPi3 is being installed.
Plug the SD card you are going to use as the RPi storage device into another system. Say it shows up as /dev/sdf. Then:
# Create an msdos partition table
$ sudo parted --script /dev/sdf mklabel msdos
# Create, format, and label a 10M fat32 partition
$ sudo parted --script /dev/sdf mkpart primary fat32 0% 10M
$ sudo mkfs.vfat /dev/sdf1
$ sudo fatlabel /dev/sdf1 RPI-FW
# Get the UEFI firmware onto the SD card
$ sudo mount /dev/sdf1 /mnt/data/
$ sudo unzip Downloads/RPi3_UEFI_Firmware_v1.38.zip -d /mnt/data/
$ sudo umount /mnt/data
At this point, the SD card can be used to boot the RPi, and you ll get a UEFI firmware.
Download the Bookworm RC 2 release of the installer, copy it to a USB stick as described in the Installation Guide, and boot your RPi from the stick. If for some reason booting from the stick does not happen automatically, enter the firmware interface with ESC and choose the USB stick from Boot Manager.
Proceed with the installation as normal, paying attention not to modify the firmware partition labeled RPI-FW. I initially thought it would be nice to reuse the firmware partition as ESP partition as well. However, setting the esp flag on makes the RPi unbootable. Either configuring the partition as ESP in debian-installer, or manually with sudo parted --script /dev/sda set 1 esp on, breaks boot. In case you accidentally do that, set it back to off and the edk2 firmware will boot again.
What I suggest doing in terms of partitioning is: (1) leave the 10M partition created above for the firmware alone, and (2) create another 512M or so ESP partition for EFI boot.
The installation should go smoothly till the end, but rebooting won t work. Doh. This is because of an important gotcha: the Raspberry Pi port of the TianoCore firmware we are using does not support setting UEFI variables persistently from a "High Level Operating System (HLOS)", which is the debian-installer in our case. Persistently is the keyword there: variables can be set and modified regularly with efibootmgr or otherwise, but crucially the modifications do not survive reboot. However, changes made from the firmware interface itself are persistent. So enter the firmware with ESC right after booting the RPi, select Boot Maintenance Manager Boot Options Add Boot Option Your SD card Your ESP partition EFI debian shimaa64.efi. Choose a creative name for your boot entry (eg: "debian"), save and exit the firmware interface. Bookworm should be booting fine at this point!

Enabling Secure Boot
Although the TianoCore firmware does support Secure Boot, there are no keys enrolled by default. To add the required keys, copy PK-0001.der, DB-0001.der, DB-0002.der, KEK-0001.der, and KEK-0002.der to a FAT32 formatted USB stick.
Here s a summary of the Subject field for each of the above:
PK-0001.der.pem
        Subject: O = Debian, CN = Debian UEFI Secure Boot (PK/KEK key), emailAddress = debian-devel@lists.debian.org
DB-0001.der.pem
        Subject: C = US, ST = Washington, L = Redmond, O = Microsoft Corporation, CN = Microsoft Windows Production PCA 2011
DB-0002.der.pem
        Subject: C = US, ST = Washington, L = Redmond, O = Microsoft Corporation, CN = Microsoft Corporation UEFI CA 2011
KEK-0001.der.pem
        Subject: O = Debian, CN = Debian UEFI Secure Boot (PK/KEK key), emailAddress = debian-devel@lists.debian.org
KEK-0002.der.pem
        Subject: C = US, ST = Washington, L = Redmond, O = Microsoft Corporation, CN = Microsoft Corporation KEK CA 2011
Plug the stick into the RPi, boot and enter the firmware interface with ESC. Select Device Manager Secure Boot Configuration Secure Boot Mode choose Custom Mode Custom Secure Boot Options PK Options Enroll PK choose PK-0001.der. Do the same for DB Options, this time choose DB-0001.der and DB-0002.der. As you may have guessed by now, the same must be done for KEK Options, but adding KEK-0001.der and KEK-0002.der. Save, exit, reboot. If everything went well, your RPi now has booted with Secure Boot enabled.
See https://wiki.debian.org/SecureBoot for the details on how to check whether Secure Boot has been enabled correctly and much more.

27 April 2023

Arturo Borrero Gonz lez: Kubecon and CloudNativeCon 2023 Europe summary

Post logo This post serves as a report from my attendance to Kubecon and CloudNativeCon 2023 Europe that took place in Amsterdam in April 2023. It was my second time physically attending this conference, the first one was in Austin, Texas (USA) in 2017. I also attended once in a virtual fashion. The content here is mostly generated for the sake of my own recollection and learnings, and is written from the notes I took during the event. The very first session was the opening keynote, which reunited the whole crowd to bootstrap the event and share the excitement about the days ahead. Some astonishing numbers were announced: there were more than 10.000 people attending, and apparently it could confidently be said that it was the largest open source technology conference taking place in Europe in recent times. It was also communicated that the next couple iteration of the event will be run in China in September 2023 and Paris in March 2024. More numbers, the CNCF was hosting about 159 projects, involving 1300 maintainers and about 200.000 contributors. The cloud-native community is ever-increasing, and there seems to be a strong trend in the industry for cloud-native technology adoption and all-things related to PaaS and IaaS. The event program had different tracks, and in each one there was an interesting mix of low-level and higher level talks for a variety of audience. On many occasions I found that reading the talk title alone was not enough to know in advance if a talk was a 101 kind of thing or for experienced engineers. But unlike in previous editions, I didn t have the feeling that the purpose of the conference was to try selling me anything. Obviously, speakers would make sure to mention, or highlight in a subtle way, the involvement of a given company in a given solution or piece of the ecosystem. But it was non-invasive and fair enough for me. On a different note, I found the breakout rooms to be often small. I think there were only a couple of rooms that could accommodate more than 500 people, which is a fairly small allowance for 10k attendees. I realized with frustration that the more interesting talks were immediately fully booked, with people waiting in line some 45 minutes before the session time. Because of this, I missed a few important sessions that I ll hopefully watch online later. Finally, on a more technical side, I ve learned many things, that instead of grouping by session I ll group by topic, given how some subjects were mentioned in several talks. On gitops and CI/CD pipelines Most of the mentions went to FluxCD and ArgoCD. At that point there were no doubts that gitops was a mature approach and both flux and argoCD could do an excellent job. ArgoCD seemed a bit more over-engineered to be a more general purpose CD pipeline, and flux felt a bit more tailored for simpler gitops setups. I discovered that both have nice web user interfaces that I wasn t previously familiar with. However, in two different talks I got the impression that the initial setup of them was simple, but migrating your current workflow to gitops could result in a bumpy ride. This is, the challenge is not deploying flux/argo itself, but moving everything into a state that both humans and flux/argo can understand. I also saw some curious mentions to the config drifts that can happen in some cases, even if the goal of gitops is precisely for that to never happen. Such mentions were usually accompanied by some hints on how to operate the situation by hand. Worth mentioning, I missed any practical information about one of the key pieces to this whole gitops story: building container images. Most of the showcased scenarios were using pre-built container images, so in that sense they were simple. Building and pushing to an image registry is one of the two key points we would need to solve in Toolforge Kubernetes if adopting gitops. In general, even if gitops were already in our radar for Toolforge Kubernetes, I think it climbed a few steps in my priority list after the conference. Another learning was this site: https://opengitops.dev/. Group On etcd, performance and resource management I attended a talk focused on etcd performance tuning that was very encouraging. They were basically talking about the exact same problems we have had in Toolforge Kubernetes, like api-server and etcd failure modes, and how sensitive etcd is to disk latency, IO pressure and network throughput. Even though Toolforge Kubernetes scale is small compared to other Kubernetes deployments out there, I found it very interesting to see other s approaches to the same set of challenges. I learned how most Kubernetes components and apps can overload the api-server. Because even the api-server talks to itself. Simple things like kubectl may have a completely different impact on the API depending on usage, for example when listing the whole list of objects (very expensive) vs a single object. The conclusion was to try avoiding hitting the api-server with LIST calls, and use ResourceVersion which avoids full-dumps from etcd (which, by the way, is the default when using bare kubectl get calls). I already knew some of this, and for example the jobs-framework-emailer was already making use of this ResourceVersion functionality. There have been a lot of improvements in the performance side of Kubernetes in recent times, or more specifically, in how resources are managed and used by the system. I saw a review of resource management from the perspective of the container runtime and kubelet, and plans to support fancy things like topology-aware scheduling decisions and dynamic resource claims (changing the pod resource claims without re-defining/re-starting the pods). On cluster management, bootstrapping and multi-tenancy I attended a couple of talks that mentioned kubeadm, and one in particular was from the maintainers themselves. This was of interest to me because as of today we use it for Toolforge. They shared all the latest developments and improvements, and the plans and roadmap for the future, with a special mention to something they called kubeadm operator , apparently capable of auto-upgrading the cluster, auto-renewing certificates and such. I also saw a comparison between the different cluster bootstrappers, which to me confirmed that kubeadm was the best, from the point of view of being a well established and well-known workflow, plus having a very active contributor base. The kubeadm developers invited the audience to submit feature requests, so I did. The different talks confirmed that the basic unit for multi-tenancy in kubernetes is the namespace. Any serious multi-tenant usage should leverage this. There were some ongoing conversations, in official sessions and in the hallway, about the right tool to implement K8s-whitin-K8s, and vcluster was mentioned enough times for me to be convinced it was the right candidate. This was despite of my impression that multiclusters / multicloud are regarded as hard topics in the general community. I definitely would like to play with it sometime down the road. On networking I attended a couple of basic sessions that served really well to understand how Kubernetes instrumented the network to achieve its goal. The conference program had sessions to cover topics ranging from network debugging recommendations, CNI implementations, to IPv6 support. Also, one of the keynote sessions had a reference to how kube-proxy is not able to perform NAT for SIP connections, which is interesting because I believe Netfilter Conntrack could do it if properly configured. One of the conclusions on the CNI front was that Calico has a massive community adoption (in Netfilter mode), which is reassuring, especially considering it is the one we use for Toolforge Kubernetes. Slide On jobs I attended a couple of talks that were related to HPC/grid-like usages of Kubernetes. I was truly impressed by some folks out there who were using Kubernetes Jobs on massive scales, such as to train machine learning models and other fancy AI projects. It is acknowledged in the community that the early implementation of things like Jobs and CronJobs had some limitations that are now gone, or at least greatly improved. Some new functionalities have been added as well. Indexed Jobs, for example, enables each Job to have a number (index) and process a chunk of a larger batch of data based on that index. It would allow for full grid-like features like sequential (or again, indexed) processing, coordination between Job and more graceful Job restarts. My first reaction was: Is that something we would like to enable in Toolforge Jobs Framework? On policy and security A surprisingly good amount of sessions covered interesting topics related to policy and security. It was nice to learn two realities:
  1. kubernetes is capable of doing pretty much anything security-wise and create greatly secured environments.
  2. it does not by default. The defaults are not security-strict on purpose.
It kind of made sense to me: Kubernetes was used for a wide range of use cases, and developers didn t know beforehand to which particular setup they should accommodate the default security levels. One session in particular covered the most basic security features that should be enabled for any Kubernetes system that would get exposed to random end users. In my opinion, the Toolforge Kubernetes setup was already doing a good job in that regard. To my joy, some sessions referred to the Pod Security Admission mechanism, which is one of the key security features we re about to adopt (when migrating away from Pod Security Policy). I also learned a bit more about Secret resources, their current implementation and how to leverage a combo of CSI and RBAC for a more secure usage of external secrets. Finally, one of the major takeaways from the conference was learning about kyverno and kubeaudit. I was previously aware of the OPA Gatekeeper. From the several demos I saw, it was to me that kyverno should help us make Toolforge Kubernetes more sustainable by replacing all of our custom admission controllers with it. I already opened a ticket to track this idea, which I ll be proposing to my team soon. Final notes In general, I believe I learned many things, and perhaps even more importantly I re-learned some stuff I had forgotten because of lack of daily exposure. I m really happy that the cloud native way of thinking was reinforced in me, which I still need because most of my muscle memory to approach systems architecture and engineering is from the old pre-cloud days. List of sessions I attended on the first day: List of sessions I attended on the second day: List of sessions I attended on third day: The videos have been published on Youtube.

18 April 2023

Matthew Garrett: PSA: upgrade your LUKS key derivation function

Here's an article from a French anarchist describing how his (encrypted) laptop was seized after he was arrested, and material from the encrypted partition has since been entered as evidence against him. His encryption password was supposedly greater than 20 characters and included a mixture of cases, numbers, and punctuation, so in the absence of any sort of opsec failures this implies that even relatively complex passwords can now be brute forced, and we should be transitioning to even more secure passphrases.

Or does it? Let's go into what LUKS is doing in the first place. The actual data is typically encrypted with AES, an extremely popular and well-tested encryption algorithm. AES has no known major weaknesses and is not considered to be practically brute-forceable - at least, assuming you have a random key. Unfortunately it's not really practical to ask a user to type in 128 bits of binary every time they want to unlock their drive, so another approach has to be taken.

This is handled using something called a "key derivation function", or KDF. A KDF is a function that takes some input (in this case the user's password) and generates a key. As an extremely simple example, think of MD5 - it takes an input and generates a 128-bit output, so we could simply MD5 the user's password and use the output as an AES key. While this could technically be considered a KDF, it would be an extremely bad one! MD5s can be calculated extremely quickly, so someone attempting to brute-force a disk encryption key could simply generate the MD5 of every plausible password (probably on a lot of machines in parallel, likely using GPUs) and test each of them to see whether it decrypts the drive.

(things are actually slightly more complicated than this - your password is used to generate a key that is then used to encrypt and decrypt the actual encryption key. This is necessary in order to allow you to change your password without having to re-encrypt the entire drive - instead you simply re-encrypt the encryption key with the new password-derived key. This also allows you to have multiple passwords or unlock mechanisms per drive)

Good KDFs reduce this risk by being what's technically referred to as "expensive". Rather than performing one simple calculation to turn a password into a key, they perform a lot of calculations. The number of calculations performed is generally configurable, in order to let you trade off between the amount of security (the number of calculations you'll force an attacker to perform when attempting to generate a key from a potential password) and performance (the amount of time you're willing to wait for your laptop to generate the key after you type in your password so it can actually boot). But, obviously, this tradeoff changes over time - defaults that made sense 10 years ago are not necessarily good defaults now. If you set up your encrypted partition some time ago, the number of calculations required may no longer be considered up to scratch.

And, well, some of these assumptions are kind of bad in the first place! Just making things computationally expensive doesn't help a lot if your adversary has the ability to test a large number of passwords in parallel. GPUs are extremely good at performing the sort of calculations that KDFs generally use, so an attacker can "just" get a whole pile of GPUs and throw them at the problem. KDFs that are computationally expensive don't do a great deal to protect against this. However, there's another axis of expense that can be considered - memory. If the KDF algorithm requires a significant amount of RAM, the degree to which it can be performed in parallel on a GPU is massively reduced. A Geforce 4090 may have 16,384 execution units, but if each password attempt requires 1GB of RAM and the card only has 24GB on board, the attacker is restricted to running 24 attempts in parallel.

So, in these days of attackers with access to a pile of GPUs, a purely computationally expensive KDF is just not a good choice. And, unfortunately, the subject of this story was almost certainly using one of those. Ubuntu 18.04 used the LUKS1 header format, and the only KDF supported in this format is PBKDF2. This is not a memory expensive KDF, and so is vulnerable to GPU-based attacks. But even so, systems using the LUKS2 header format used to default to argon2i, again not a memory expensive KDFwhich is memory strong, but not designed to be resistant to GPU attack (thanks to the comments pointing out my misunderstanding here). New versions default to argon2id, which is. You want to be using argon2id.

What makes this worse is that distributions generally don't update this in any way. If you installed your system and it gave you pbkdf2 as your KDF, you're probably still using pbkdf2 even if you've upgraded to a system that would use argon2id on a fresh install. Thankfully, this can all be fixed-up in place. But note that if anything goes wrong here you could lose access to all your encrypted data, so before doing anything make sure it's all backed up (and figure out how to keep said backup secure so you don't just have your data seized that way).

First, make sure you're running as up-to-date a version of your distribution as possible. Having tools that support the LUKS2 format doesn't mean that your distribution has all of that integrated, and old distribution versions may allow you to update your LUKS setup without actually supporting booting from it. Also, if you're using an encrypted /boot, stop now - very recent versions of grub2 support LUKS2, but they don't support argon2id, and this will render your system unbootable.

Next, figure out which device under /dev corresponds to your encrypted partition. Run

lsblk

and look for entries that have a type of "crypt". The device above that in the tree is the actual encrypted device. Record that name, and run

sudo cryptsetup luksHeaderBackup /dev/whatever --header-backup-file /tmp/luksheader

and copy that to a USB stick or something. If something goes wrong here you'll be able to boot a live image and run

sudo cryptsetup luksHeaderRestore /dev/whatever --header-backup-file luksheader

to restore it.

(Edit to add: Once everything is working, delete this backup! It contains the old weak key, and someone with it can potentially use that to brute force your disk encryption key using the old KDF even if you've updated the on-disk KDF.)

Next, run

sudo cryptsetup luksDump /dev/whatever

and look for the Version: line. If it's version 1, you need to update the header to LUKS2. Run

sudo cryptsetup convert /dev/whatever --type luks2

and follow the prompts. Make sure your system still boots, and if not go back and restore the backup of your header. Assuming everything is ok at this point, run

sudo cryptsetup luksDump /dev/whatever

again and look for the PBKDF: line in each keyslot (pay attention only to the keyslots, ignore any references to pbkdf2 that come after the Digests: line). If the PBKDF is either "pbkdf2" or "argon2i" you should convert to argon2id. Run the following:

sudo cryptsetup luksConvertKey /dev/whatever --pbkdf argon2id

and follow the prompts. If you have multiple passwords associated with your drive you'll have multiple keyslots, and you'll need to repeat this for each password.

Distributions! You should really be handling this sort of thing on upgrade. People who installed their systems with your encryption defaults several years ago are now much less secure than people who perform a fresh install today. Please please please do something about this.

comment count unavailable comments

25 March 2023

Gunnar Wolf: Now that we are talking about kernel building... What about firebuild?

After my last post, B lint (who prompted it with his last post) suggested I should do a hybrid test of his tests and my extremes. He suggested I should build the Linux kernel using my Raspberry Pi 4 (8GB model), but using the Firebuild build accelerator. Before going any further: I must make clear that while Firebuild is freely redistributable, it is not made available under a free license. It is free for personal use or commercial trial, but otherwise requires licensing. B lint managed to build a Linux kernel in just over 8 seconds. So, how did my test go? My previous experiment, using -j 4, built Linux in ~100 minutes; this was about a year ago, and I m now building linux 6.1, so I timed this again. To get a baseline, I built my kernel from a just-unpacked tree, just as usual:
# cd /usr/src/linux-source-6.1
# make clean
# make defconfig
# time make -j4
(...)
real    117m30.588s
user    392m41.434s
sys     52m2.556s
Of course, having all of the object files built makes the rebuild process quite faster (this is still done without firebuild). I understand calling make defconfig without cleaning does not change much, but I saw it often referenced in firebuild s docs, so I m leaving it:
# time make -j 4
(...)
real    0m43.822s
user    1m36.577s
sys     0m40.805s
Then, I did a first run using firebuild. Firebuild is a caching build optimizer, so the first run will naturally be somewhat slower (but if you often rebuild your kernel, it should be seen as an investment). Now, in the Raspberry Pi, that uses a slow SD card interface for its storage It is a heavy investment. The first time I built with firebuild, it meant almost a 100% build time hit:
# cd /usr/src/linux-source-6.1
# make clean
# make defconfig
# time firebuild make -j 4
(...)
real    212m58.647s
user    391m49.080s
sys     81m10.758s
Not only that; I am using a fairly decent and big 32GB card, but this is quite a big price to pay in such a limited system!
# du -sh .cache/firebuild/
4.2G    .cache/firebuild/
I did a build without cleaning the build directory, using firebuild, and it does help although not by so much as in higher performance systems:
# cd /usr/src/linux-source-6.1
# make clean
# make defconfig
# time firebuild make -j 4
(...)
real    68m6.621s
user    98m32.514s
sys     31m41.643s
So, it built in roughly 65% of the time it would take to build regularly. And what about rebuilding without cleaning?
# make defconfig
# time firebuild make -j 4
(...)
real    1m11.872s
user    2m5.807s
sys     1m46.178s
In this case, using firebuild worked roughly 30% slower than not using it. I guess the high number of file ops inside .cache/firebuild are to blame, as in the case of the media I m using, those are quite expensive; make went its way basically checking date stamps between *.c and *.o (yes, very roughly), and while running under firebuild, I suppose each of these meant an extra lookup inside the cache. So Experiment requested, experiment performed!

22 March 2023

Michael Prokop: Automatically unlocking a LUKS encrypted root filesystem during boot

Update on 2023-03-23: thanks to Daniel Roschka for mentioning the Mandos and TPM approaches, which might be better alternatives, depending on your options and needs. Peter Palfrader furthermore pointed me towards clevis-initramfs and tang. A customer of mine runs dedicated servers inside a foreign data-center, remote hands only. In such an environment you might need a disk replacement because you need bigger or faster disks, though also a disk might (start to) fail and you need a replacement. One has to be prepared for such a scenario, but fully wiping your used disk then might not always be an option, especially once disks (start to) fail. On the other hand you don t want to end up with (partial) data on your disk handed over to someone unexpected. By encrypting the data on your disks upfront you can prevent against this scenario. But if you have a fleet of servers you might not want to manually jump on servers during boot and unlock crypto volumes manually. It s especially annoying if it s about the root filesystem where a solution like dropbear-initramfs needs to be used for remote access during initramfs boot stage. So my task for the customer was to adjust encrypted LUKS devices such that no one needs to manually unlock the encrypted device during server boot (with some specific assumptions about possible attack vectors one has to live with, see the disclaimer at the end). The documentation about this use-case was rather inconsistent, especially because special rules apply for the root filesystem (no key file usage), we see different behavior between what s supported by systemd (hello key file again), initramfs-tools and dracut, not to mention the changes between different distributions. Since tests with this tend to be rather annoying (better make sure to have a Grml live system available :)), I m hereby documenting what worked for us (Debian/bullseye with initramfs-tools and cryptsetup-initramfs). The system was installed with LVM on-top of an encrypted Software-RAID device, only the /boot partition is unencrypted. But even if you don t use Software-RAID nor LVM the same instructions apply. The system looks like this:
% mount -t ext4 -l
/dev/mapper/foobar-root_1 on / type ext4 (rw,relatime,errors=remount-ro)
% sudo pvs
  PV                    VG     Fmt  Attr PSize   PFree
  /dev/mapper/md1_crypt foobar lvm2 a--  445.95g 430.12g
% sudo vgs
  VG     #PV #LV #SN Attr   VSize   VFree
  foobar   1   2   0 wz--n- 445.95g 430.12g
% sudo lvs
  LV     VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root_1 foobar -wi-ao---- <14.90g
% lsblk
NAME                  MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
[...]
sdd                     8:48   0 447.1G  0 disk
 sdd1                  8:49   0   571M  0 part  /boot/efi
 sdd2                  8:50   0   488M  0 part
   md0                 9:0    0   487M  0 raid1 /boot
 sdd3                  8:51   0 446.1G  0 part
   md1                 9:1    0   446G  0 raid1
     md1_crypt       253:0    0   446G  0 crypt
       foobar-root_1 253:1    0  14.9G  0 lvm   /
[...]
sdf                     8:80   0 447.1G  0 disk
 sdf1                  8:81   0   571M  0 part
 sdf2                  8:82   0   488M  0 part
   md0                 9:0    0   487M  0 raid1 /boot
 sdf3                  8:83   0 446.1G  0 part
   md1                 9:1    0   446G  0 raid1
     md1_crypt       253:0    0   446G  0 crypt
       foobar-root_1 253:1    0  14.9G  0 lvm   /
The actual crypsetup configuration is:
% cat /etc/crypttab
md1_crypt UUID=77246138-b666-4151-b01c-5a12db54b28b none luks,discard
Now, to automatically open the crypto device during boot we can instead use:
% cat /etc/crypttab 
md1_crypt UUID=77246138-b666-4151-b01c-5a12db54b28b none luks,discard,keyscript=/etc/initramfs-tools/unlock.sh
# touch /etc/initramfs-tools/unlock.sh
# chmod 0700 /etc/initramfs-tools/unlock.sh
# $EDITOR etc/initramfs-tools/unlock.sh
# cat /etc/initramfs-tools/unlock.sh
#!/bin/sh
echo -n "provide_the_actual_password_here"
# update-initramfs -k all -u
[...]
The server will then boot without prompting for a crypto password. Note that initramfs-tools by default uses an insecure umask of 0022, resulting in the initrd being accessible to everyone. But if you have the dropbear-initramfs package installed, its /usr/share/initramfs-tools/conf-hooks.d/dropbear sets UMASK=0077 , so the resulting /boot/initrd* file should automatically have proper permissions (0600). The cryptsetup hook warns about a permissive umask configuration during update-initramfs runs, but if you want to be sure, explicitly set it via e.g.:
# cat > /etc/initramfs-tools/conf.d/umask << EOF
# restrictive umask to avoid non-root access to initrd:
UMASK=0077
EOF
# update-initramfs -k all -u
Disclaimer: Of course you need to trust users with access to /etc/initramfs-tools/unlock.sh as well as the initramfs/initrd on your system. Furthermore you should wipe the boot partition (to destroy the keyfile information) before handing over such a disk. But that is a risk my customer can live with, YMMV.

21 March 2023

Gunnar Wolf: Impact of parallelism and processor architecture while building a kernel

Given that B lint just braggedblogged about how efficiently he can build a Linux kernel (less than 8 seconds, wow! Well, yes, until you read it is the result of aggressive caching and is achieved only for a second run), and that a question just popped up today on the Debian ARM mailing list, is an ARM computer a good choice? Which one? , I decided to share my results of an experiment I did several months ago, to graphically show to my students the effects of parallelism, the artifacts of hyperthreading, the effects of different architecture sets, and even illustrate about the actual futility of my experiment (somewhat referring to John Gustafson s reevaluation of Amdahl s law, already 30 years ago One does not take a fixed-size problem and run it on various numbers of processors except when doing academic research ; thanks for referring to my inconsequential reiterative compilations as academic research! ) I don t expect any of the following images to be groundbreaking, but at least, next time I need to find them it is quite likely I ll be able to find them and I will be able to more easily refer to them in online discussions So What did I do? I compiled Linux repeatedly, on several of the machines I had available, varying the -j flag (how many cores to use simultaneously), starting with single-core, and pushing up until just a bit over the physical number of cores the CPU has. Sadly, I lost several of my output images, but the three following are enough to tell interesting bits of the story: Of course, I have to add that this is not a scientific comparison; the server and my laptop have much better I/O than the Raspberry s puny micro-SD card (and compiling hundreds of thousands of files is quite an IO-stressed job, even though the full task does exhibit the very low compared single-threaded performance of the Raspberry even compared with the Yoga). No optimizations were done (they would be harmful to the effects I wanted to show!), the compile was made straight from the upstream sources.

13 March 2023

Russell Coker: Firebuild

After reading B lint s blog post about Firebuild (a compile cache) [1] I decided to give it a go. It s non-free, the project web site [2] says that it s free for non-commercial use or commercial trials. My first attempt at building a Debian package failed due to man-recode using a seccomp() sandbox, I filed Debian bug #1032619 [3] about this (thanks for the quick response B lint). The solution for me was to edit /etc/firebuild.conf and add man-recode to the dont_intercept list. The new version that s just been uploaded to Debian fixes it by disabling seccomp() and will presumably allow slightly better performance. Here are the results of building the refpolicy package with Firebuild, a regular build, the first build with Firebuild (30% slower) and a rebuild with Firebuild that reduced the time by almost 42%.
real    1m32.026s
user    4m20.200s
sys     2m33.324s
real    2m4.111s
user    6m31.769s
sys     3m53.681s
real    0m53.632s
user    1m41.334s
sys     3m36.227s
Next I did a test of building a Linux 6.1.10 kernel with make bzImage -j18 , here are the results from a normal build, first build with firebuild, and second build. The real time is worse with firebuild for this on my machine. I think that the relative speeds of my CPU (reasonably fast 18 core) and storage (two of the slower NVMe devices in a BTRFS RAID-1) is the cause of the first build being relatively so much slower for make bzImage than for building the refpolicy, as the kernel build process involves a lot more data. For the final build I moved ~/.cache/firebuild to a tmpfs (I have 128G of RAM and not much running on my machine at the time of the tests), even then building with firebuild was slightly slower in real time but took significantly less CPU time (user+real being 20mins instead of 36m). I also ran several tests with the kernel source tree on a tmpfs but for unknown reasons those tests each took about 6 minutes. Does firebuild or the Linux kernel build process dislike tmpfs for some reason?
real    2m43.020s
user    31m30.551s
sys     5m15.279s
real    8m49.675s
user    64m11.258s
sys     19m39.016s
real    3m6.858s
user    7m47.556s
sys     9m22.513s
real    2m51.910s
user    10m53.870s
sys     9m21.307s
One thing I noticed from the kernel build tests is that the total CPU time taken by the firebuild process (as reported by ps) was more than 2/3 of the run time and top usually reported it as taking around 75% of a CPU core. It seems to me that the firebuild process itself is a bottleneck on build speed. Building refpolicy without firebuild has an average of 4.5 cores in use while building the kernel haas 13.5. Unless they make a multi-threaded version of firebuild it seems that it won t give the performance one would hope for from a CPU with 18+ cores. I presume that if I had been running with hyper-threading enabled then firebuild would have been even worse for kernel builds as it would sometimes get on the second thread of a core. It looks like firebuild would perform better on AMD CPUs as they tend to have fewer CPU cores with greater average performance per core so a single CPU core for firebuild will be less limited. I presume that the firebuild developers will make it perform better with large numbers of cores in future, the latest Intel laptop CPUs have 16+ cores and servers with 2*40core CPUs are common. The performance improvement for refpolicy is significant as a portion of build time, but insignificant in terms of real time. A full build of refpolicy doesn t take enough time to get a Coke and reducing it doesn t offer a huge benefit, if Firebuild was available in past years when refpolicy took 20 minutes to build (when DDR2 was the best RAM available) then it would be a different story. There is some potential to optimise the build of refpolicy for the non-firebuild case. Getting it to average more than 4.5 cores in use when there s 18 available should be possible, there are a number of shell for loops in the main Makefile and maybe some of them can be replaced by make constructs to allow running in parallel. If it used 7 cores on average then it would be faster in a regular build than it currently is with firebuild and a hot cache. Any advice from make experts would be appreciated.

6 March 2023

Vincent Bernat: DDoS detection and remediation with Akvorado and Flowspec

Akvorado collects sFlow and IPFIX flows, stores them in a ClickHouse database, and presents them in a web console. Although it lacks built-in DDoS detection, it s possible to create one by crafting custom ClickHouse queries.

DDoS detection Let s assume we want to detect DDoS targeting our customers. As an example, we consider a DDoS attack as a collection of flows over one minute targeting a single customer IP address, from a single source port and matching one of these conditions:
  • an average bandwidth of 1 Gbps,
  • an average bandwidth of 200 Mbps when the protocol is UDP,
  • more than 20 source IP addresses and an average bandwidth of 100 Mbps, or
  • more than 10 source countries and an average bandwidth of 100 Mbps.
Here is the SQL query to detect such attacks over the last 5 minutes:
SELECT *
FROM (
  SELECT
    toStartOfMinute(TimeReceived) AS TimeReceived,
    DstAddr,
    SrcPort,
    dictGetOrDefault('protocols', 'name', Proto, '???') AS Proto,
    SUM(((((Bytes * SamplingRate) * 8) / 1000) / 1000) / 1000) / 60 AS Gbps,
    uniq(SrcAddr) AS sources,
    uniq(SrcCountry) AS countries
  FROM flows
  WHERE TimeReceived > now() - INTERVAL 5 MINUTE
    AND DstNetRole = 'customers'
  GROUP BY
    TimeReceived,
    DstAddr,
    SrcPort,
    Proto
)
WHERE (Gbps > 1)
   OR ((Proto = 'UDP') AND (Gbps > 0.2)) 
   OR ((sources > 20) AND (Gbps > 0.1)) 
   OR ((countries > 10) AND (Gbps > 0.1))
ORDER BY
  TimeReceived DESC,
  Gbps DESC
Here is an example output1 where two of our users are under attack. One from what looks like an NTP amplification attack, the other from a DNS amplification attack:
TimeReceived DstAddr SrcPort Proto Gbps sources countries
2023-02-26 17:44:00 ::ffff:203.0.113.206 123 UDP 0.102 109 13
2023-02-26 17:43:00 ::ffff:203.0.113.206 123 UDP 0.130 133 17
2023-02-26 17:43:00 ::ffff:203.0.113.68 53 UDP 0.129 364 63
2023-02-26 17:43:00 ::ffff:203.0.113.206 123 UDP 0.113 129 21
2023-02-26 17:42:00 ::ffff:203.0.113.206 123 UDP 0.139 50 14
2023-02-26 17:42:00 ::ffff:203.0.113.206 123 UDP 0.105 42 14
2023-02-26 17:40:00 ::ffff:203.0.113.68 53 UDP 0.121 340 65

DDoS remediation Once detected, there are at least two ways to stop the attack at the network level:
  • blackhole the traffic to the targeted user (RTBH), or
  • selectively drop packets matching the attack patterns (Flowspec).

Traffic blackhole The easiest method is to sacrifice the attacked user. While this helps the attacker, this protects your network. It is a method supported by all routers. You can also offload this protection to many transit providers. This is useful if the attack volume exceeds your internet capacity. This works by advertising with BGP a route to the attacked user with a specific community. The border router modifies the next hop address of these routes to a specific IP address configured to forward the traffic to a null interface. RFC 7999 defines 65535:666 for this purpose. This is known as a remote-triggered blackhole (RTBH) and is explained in more detail in RFC 3882. It is also possible to blackhole the source of the attacks by leveraging unicast Reverse Path Forwarding (uRPF) from RFC 3704, as explained in RFC 5635. However, uRPF can be a serious tax on your router resources. See NCS5500 uRPF: Configuration and Impact on Scale for an example of the kind of restrictions you have to expect when enabling uRPF. On the advertising side, we can use BIRD. Here is a complete configuration file to allow any router to collect them:
log stderr all;
router id 192.0.2.1;
protocol device  
  scan time 10;
 
protocol bgp exporter  
  ipv4  
    import none;
    export where proto = "blackhole4";
   ;
  ipv6  
    import none;
    export where proto = "blackhole6";
   ;
  local as 64666;
  neighbor range 192.0.2.0/24 external;
  multihop;
  dynamic name "exporter";
  dynamic name digits 2;
  graceful restart yes;
  graceful restart time 0;
  long lived graceful restart yes;
  long lived stale time 3600;  # keep routes for 1 hour!
 
protocol static blackhole4  
  ipv4;
  route 203.0.113.206/32 blackhole  
    bgp_community.add((65535, 666));
   ;
  route 203.0.113.68/32 blackhole  
    bgp_community.add((65535, 666));
   ;
 
protocol static blackhole6  
  ipv6;
 
We use BGP long-lived graceful restart to ensure routes are kept for one hour, even if the BGP connection goes down, notably during maintenance. On the receiver side, if you have a Cisco router running IOS XR, you can use the following configuration to blackhole traffic received on the BGP session. As the BGP session is dedicated to this usage, The community is not used, but you can also forward these routes to your transit providers.
router static
 vrf public
  address-family ipv4 unicast
   192.0.2.1/32 Null0 description "BGP blackhole"
  !
  address-family ipv6 unicast
   2001:db8::1/128 Null0 description "BGP blackhole"
  !
 !
!
route-policy blackhole_ipv4_in_public
  if destination in (0.0.0.0/0 le 31) then
    drop
  endif
  set next-hop 192.0.2.1
  done
end-policy
!
route-policy blackhole_ipv6_in_public
  if destination in (::/0 le 127) then
    drop
  endif
  set next-hop 2001:db8::1
  done
end-policy
!
router bgp 12322
 neighbor-group BLACKHOLE_IPV4_PUBLIC
  remote-as 64666
  ebgp-multihop 255
  update-source Loopback10
  address-family ipv4 unicast
   maximum-prefix 100 90
   route-policy blackhole_ipv4_in_public in
   route-policy drop out
   long-lived-graceful-restart stale-time send 86400 accept 86400
  !
  address-family ipv6 unicast
   maximum-prefix 100 90
   route-policy blackhole_ipv6_in_public in
   route-policy drop out
   long-lived-graceful-restart stale-time send 86400 accept 86400
  !
 !
 vrf public
  neighbor 192.0.2.1
   use neighbor-group BLACKHOLE_IPV4_PUBLIC
   description akvorado-1
When the traffic is blackholed, it is still reported by IPFIX and sFlow. In Akvorado, use ForwardingStatus >= 128 as a filter. While this method is compatible with all routers, it makes the attack successful as the target is completely unreachable. If your router supports it, Flowspec can selectively filter flows to stop the attack without impacting the customer.

Flowspec Flowspec is defined in RFC 8955 and enables the transmission of flow specifications in BGP sessions. A flow specification is a set of matching criteria to apply to IP traffic. These criteria include the source and destination prefix, the IP protocol, the source and destination port, and the packet length. Each flow specification is associated with an action, encoded as an extended community: traffic shaping, traffic marking, or redirection. To announce flow specifications with BIRD, we extend our configuration. The extended community used shapes the matching traffic to 0 bytes per second.
flow4 table flowtab4;
flow6 table flowtab6;
protocol bgp exporter  
  flow4  
    import none;
    export where proto = "flowspec4";
   ;
  flow6  
    import none;
    export where proto = "flowspec6";
   ;
  # [ ]
 
protocol static flowspec4  
  flow4;
  route flow4  
    dst 203.0.113.68/32;
    sport = 53;
    length >= 1476 && <= 1500;
    proto = 17;
   
    bgp_ext_community.add((generic, 0x80060000, 0x00000000));
   ;
  route flow4  
    dst 203.0.113.206/32;
    sport = 123;
    length = 468;
    proto = 17;
   
    bgp_ext_community.add((generic, 0x80060000, 0x00000000));
   ;
 
protocol static flowspec6  
  flow6;
 
If you have a Cisco router running IOS XR, the configuration may look like this:
vrf public
 address-family ipv4 flowspec
 address-family ipv6 flowspec
!
router bgp 12322
 address-family vpnv4 flowspec
 address-family vpnv6 flowspec
 neighbor-group FLOWSPEC_IPV4_PUBLIC
  remote-as 64666
  ebgp-multihop 255
  update-source Loopback10
  address-family ipv4 flowspec
   long-lived-graceful-restart stale-time send 86400 accept 86400
   route-policy accept in
   route-policy drop out
   maximum-prefix 100 90
   validation disable
  !
  address-family ipv6 flowspec
   long-lived-graceful-restart stale-time send 86400 accept 86400
   route-policy accept in
   route-policy drop out
   maximum-prefix 100 90
   validation disable
  !
 !
 vrf public
  address-family ipv4 flowspec
  address-family ipv6 flowspec
  neighbor 192.0.2.1
   use neighbor-group FLOWSPEC_IPV4_PUBLIC
   description akvorado-1
Then, you need to enable Flowspec on all interfaces with:
flowspec
 vrf public
  address-family ipv4
   local-install interface-all
  !
  address-family ipv6
   local-install interface-all
  !
 !
!
As with the RTBH setup, you can filter dropped flows with ForwardingStatus >= 128.

DDoS detection (continued) In the example using Flowspec, the flows were also filtered on the length of the packet:
route flow4  
  dst 203.0.113.68/32;
  sport = 53;
  length >= 1476 && <= 1500;
  proto = 17;
 
  bgp_ext_community.add((generic, 0x80060000, 0x00000000));
 ;
This is an important addition: legitimate DNS requests are smaller than this and therefore not filtered.2 With ClickHouse, you can get the 10th and 90th percentiles of the packet sizes with quantiles(0.1, 0.9)(Bytes/Packets). The last issue we need to tackle is how to optimize the request: it may need several seconds to collect the data and it is likely to consume substantial resources from your ClickHouse database. One solution is to create a materialized view to pre-aggregate results:
CREATE TABLE ddos_logs (
  TimeReceived DateTime,
  DstAddr IPv6,
  Proto UInt32,
  SrcPort UInt16,
  Gbps SimpleAggregateFunction(sum, Float64),
  Mpps SimpleAggregateFunction(sum, Float64),
  sources AggregateFunction(uniqCombined(12), IPv6),
  countries AggregateFunction(uniqCombined(12), FixedString(2)),
  size AggregateFunction(quantiles(0.1, 0.9), UInt64)
) ENGINE = SummingMergeTree
PARTITION BY toStartOfHour(TimeReceived)
ORDER BY (TimeReceived, DstAddr, Proto, SrcPort)
TTL toStartOfHour(TimeReceived) + INTERVAL 6 HOUR DELETE ;
CREATE MATERIALIZED VIEW ddos_logs_view TO ddos_logs AS
  SELECT
    toStartOfMinute(TimeReceived) AS TimeReceived,
    DstAddr,
    Proto,
    SrcPort,
    sum(((((Bytes * SamplingRate) * 8) / 1000) / 1000) / 1000) / 60 AS Gbps,
    sum(((Packets * SamplingRate) / 1000) / 1000) / 60 AS Mpps,
    uniqCombinedState(12)(SrcAddr) AS sources,
    uniqCombinedState(12)(SrcCountry) AS countries,
    quantilesState(0.1, 0.9)(toUInt64(Bytes/Packets)) AS size
  FROM flows
  WHERE DstNetRole = 'customers'
  GROUP BY
    TimeReceived,
    DstAddr,
    Proto,
    SrcPort
The ddos_logs table is using the SummingMergeTree engine. When the table receives new data, ClickHouse replaces all the rows with the same sorting key, as defined by the ORDER BY directive, with one row which contains summarized values using either the sum() function or the explicitly specified aggregate function (uniqCombined and quantiles in our example).3 Finally, we can modify our initial query with the following one:
SELECT *
FROM (
  SELECT
    TimeReceived,
    DstAddr,
    dictGetOrDefault('protocols', 'name', Proto, '???') AS Proto,
    SrcPort,
    sum(Gbps) AS Gbps,
    sum(Mpps) AS Mpps,
    uniqCombinedMerge(12)(sources) AS sources,
    uniqCombinedMerge(12)(countries) AS countries,
    quantilesMerge(0.1, 0.9)(size) AS size
  FROM ddos_logs
  WHERE TimeReceived > now() - INTERVAL 60 MINUTE
  GROUP BY
    TimeReceived,
    DstAddr,
    Proto,
    SrcPort
)
WHERE (Gbps > 1)
   OR ((Proto = 'UDP') AND (Gbps > 0.2)) 
   OR ((sources > 20) AND (Gbps > 0.1)) 
   OR ((countries > 10) AND (Gbps > 0.1))
ORDER BY
  TimeReceived DESC,
  Gbps DESC

Gluing everything together To sum up, building an anti-DDoS system requires to following these steps:
  1. define a set of criteria to detect a DDoS attack,
  2. translate these criteria into SQL requests,
  3. pre-aggregate flows into SummingMergeTree tables,
  4. query and transform the results to a BIRD configuration file, and
  5. configure your routers to pull the routes from BIRD.
A Python script like the following one can handle the fourth step. For each attacked target, it generates both a Flowspec rule and a blackhole route.
import socket
import types
from clickhouse_driver import Client as CHClient
# Put your SQL query here!
SQL_QUERY = " "
# How many anti-DDoS rules we want at the same time?
MAX_DDOS_RULES = 20
def empty_ruleset():
    ruleset = types.SimpleNamespace()
    ruleset.flowspec = types.SimpleNamespace()
    ruleset.blackhole = types.SimpleNamespace()
    ruleset.flowspec.v4 = []
    ruleset.flowspec.v6 = []
    ruleset.blackhole.v4 = []
    ruleset.blackhole.v6 = []
    return ruleset
current_ruleset = empty_ruleset()
client = CHClient(host="clickhouse.akvorado.net")
while True:
    results = client.execute(SQL_QUERY)
    seen =  
    new_ruleset = empty_ruleset()
    for (t, addr, proto, port, gbps, mpps, sources, countries, size) in results:
        if (addr, proto, port) in seen:
            continue
        seen[(addr, proto, port)] = True
        # Flowspec
        if addr.ipv4_mapped:
            address = addr.ipv4_mapped
            rules = new_ruleset.flowspec.v4
            table = "flow4"
            mask = 32
            nh = "proto"
        else:
            address = addr
            rules = new_ruleset.flowspec.v6
            table = "flow6"
            mask = 128
            nh = "next header"
        if size[0] == size[1]:
            length = f"length =  int(size[0]) "
        else:
            length = f"length >=  int(size[0])  && <=  int(size[1]) "
        header = f"""
# Time:  t 
# Source:  address , protocol:  proto , port:  port 
# Gbps/Mpps:  gbps:.3 / mpps:.3 , packet size:  int(size[0]) <=X<= int(size[1]) 
# Flows:  flows , sources:  sources , countries:  countries 
"""
        rules.append(
                f""" header 
route  table   
  dst  address / mask ;
  sport =  port ;
   length ;
   nh  =  socket.getprotobyname(proto) ;
 
  bgp_ext_community.add((generic, 0x80060000, 0x00000000));
 ;
"""
        )
        # Blackhole
        if addr.ipv4_mapped:
            rules = new_ruleset.blackhole.v4
        else:
            rules = new_ruleset.blackhole.v6
        rules.append(
            f""" header 
route  address / mask  blackhole  
  bgp_community.add((65535, 666));
 ;
"""
        )
        new_ruleset.flowspec.v4 = list(
            set(new_ruleset.flowspec.v4[:MAX_DDOS_RULES])
        )
        new_ruleset.flowspec.v6 = list(
            set(new_ruleset.flowspec.v6[:MAX_DDOS_RULES])
        )
        # TODO: advertise changes by mail, chat, ...
        current_ruleset = new_ruleset
        changes = False
        for rules, path in (
            (current_ruleset.flowspec.v4, "v4-flowspec"),
            (current_ruleset.flowspec.v6, "v6-flowspec"),
            (current_ruleset.blackhole.v4, "v4-blackhole"),
            (current_ruleset.blackhole.v6, "v6-blackhole"),
        ):
            path = os.path.join("/etc/bird/", f" path .conf")
            with open(f" path .tmp", "w") as f:
                for r in rules:
                    f.write(r)
            changes = (
                changes or not os.path.exists(path) or not samefile(path, f" path .tmp")
            )
            os.rename(f" path .tmp", path)
        if not changes:
            continue
        proc = subprocess.Popen(
            ["birdc", "configure"],
            stdin=subprocess.DEVNULL,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
        )
        stdout, stderr = proc.communicate(None)
        stdout = stdout.decode("utf-8", "replace")
        stderr = stderr.decode("utf-8", "replace")
        if proc.returncode != 0:
            logger.error(
                "  error:\n \n ".format(
                    "birdc reconfigure",
                    "\n".join(
                        [" O:  ".format(line) for line in stdout.rstrip().split("\n")]
                    ),
                    "\n".join(
                        [" E:  ".format(line) for line in stderr.rstrip().split("\n")]
                    ),
                )
            )

Until Akvorado integrates DDoS detection and mitigation, the ideas presented in this blog post provide a solid foundation to get started with your own anti-DDoS system.

  1. ClickHouse can export results using Markdown format when appending FORMAT Markdown to the query.
  2. While most DNS clients should retry with TCP on failures, this is not always the case: until recently, musl libc did not implement this.
  3. The materialized view also aggregates the data at hand, both for efficiency and to ensure we work with the right data types.

26 February 2023

Russ Allbery: Review: An Informal History of the Hugos

Review: An Informal History of the Hugos, by Jo Walton
Publisher: Tor
Copyright: August 2018
ISBN: 1-4668-6573-3
Format: Kindle
Pages: 564
An Informal History of the Hugos is another collection of Jo Walton's Tor.com posts. As with What Makes This Book So Great, these are blog posts that are still available for free on-line. Unlike that collection, this series happened after Tor.com got better at tags, so it's much easier to find. Whether to buy it therefore depends on whether having it in convenient book form is worth it to you. Walton's previous collection was a somewhat random assortment of reviews of whatever book she felt like reviewing. As you may guess from the title, this one is more structured. She starts at the first year that the Hugo Awards were given out (1953) and discusses the winners for each year up through 2000. Nearly all of that discussion is about the best novel Hugo, a survey of other good books for that year, and, when other awards (Nebula, Locus, etc.) start up, comparing them to the winners and nominees of other awards. One of the goals of each discussion is to decide whether the Hugo nominees did a good job of capturing the best books of the year and the general feel of the genre at that time. There are a lot of pages in this book, but that's partly because there's a lot of filler. Each post includes all of the winners and (once a nomination system starts) nominees in every Hugo category. Walton offers an in-depth discussion of the novel in every year, and an in-depth discussion of the John W. Campbell Award for Best New Writer (technically not a Hugo but awarded with them and voted on in the same way) once those start. Everything else gets a few sentences at most, so it's mostly just lists, all of which you can readily find elsewhere if you cared. Personally, I would have omitted categories without commentary when this was edited into book form. Two other things are included in this book. Most helpfully, Walton's Tor.com reviews of novels in the shortlist are included after the discussion of that year. If you like Walton's reviews, this is great for all the reasons that What Makes This Book So Great was so much fun. Walton has a way of talking about books with infectious enthusiasm, brief but insightful technical analysis, and a great deal of genre context without belaboring any one point. They're concise and readable and never outlast my attention span, and I wish I could write reviews half as well. The other inclusion is a selection of the comments from the original blog posts. When these posts originally ran, they turned into a community discussion of the corresponding year of SF, and Tor included a selection of those comments in the book. Full disclosure: one of those comments is mine, about the way that cyberpunk latched on to some incorrect ideas of how computers work and made them genre conventions to such a degree that most cyberpunk takes place in a parallel universe with very different computer technology. (I suppose that technically makes me a published author to the tune of a couple of pages.) While I still largely agree with the comment, I blamed Neuromancer for this at the time, and embarrassingly discovered when re-reading it that I had been unfair. This is why one should never express opinions in public where someone might record them. Anyway, there is a general selection of comments from random people, but the vast majority of the comments are discussions of the year's short fiction by Rich Horton and Gardner Dozois. I understand why this was included; Walton doesn't talk about the short fiction, Dozois was a legendary SF short fiction editor and multiple Hugo winner, and both Horton and Dozois reviewed short fiction for Locus. But they don't attempt reviews. For nearly all stories under discussion, unless you recognized the title, you would have no idea even what sub-genre it was in. It's just a sequence of assertions about which title or author was better. Given that there are (in most years) three short fiction categories to the one novel category and both Horton and Dozois write about each category, I suspect there are more words in this book from Horton and Dozois than Walton. That's a problem when those comments turn into tedious catalogs. Reviewing short fiction, particularly short stories, is inherently difficult. I've tried to do a lot of that myself, and it's tricky to find something useful to say that doesn't spoil the story. And to be fair to Horton and Dozois, they weren't being paid to write reviews; they were just commenting on blog posts as part of a community conversation, and I doubt anyone thought this would turn into a book. But when read as a book, their inclusion in this form wasn't my favorite editorial decision. This is therefore a collection of Walton's commentary on the selections for best novel and best new writer alongside a whole lot of boring lists. In theory, the padding shouldn't matter; one can skip over it and just read Walton's parts, and that's still lots of material. But Walton's discussion of the best novels of the year also tends to turn into long lists of books with no commentary (particularly once the very-long Locus recommended list starts appearing), adding to the tedium. This collection requires a lot of skimming. I enjoyed this series of blog posts when they were first published, but even at the time I skimmed the short fiction comments. Gathered in book form with this light of editing, I think it was less successful. If you are curious about the history of science fiction awards and never read the original posts, you may enjoy this, but I would rather have read another collection of straight reviews. Rating: 6 out of 10

25 February 2023

Gregor Herrmann: demo video: dpt(1) in pkg-perl-tools

in the Debian Perl Group we are maintaining a lot of packages (around 4000 at the time of writing). this also means that we are spending some time on improving our tools which allow us to handle this amount of packages in a reasonable time. many of the tools are shipped in the pkg-perl-tools package since 2013, & lots of them are scripts which are called as subcommands of the dpt(1) wrapper script. in the last years I got the impression that not all team members are aware of all the useful tools, & that some more promotion might be called for. & last week I was in the mood for creating a short demo video to showcase how I use some dpt(1) subcommands when updating a package to a new upstream release. (even though I prefer text over videos myself :)) probably not a cinematographic masterpiece but as the feedback of a few viewers has been positive, I'm posting it here as well: (direct link as planets ignore iframes )

17 February 2023

Jonathan McDowell: First impressions of the VisionFive 2

VisionFive 2 packaging Back in September last year I chose to back the StarFive VisionFive 2 on Kickstarter. I don t have a particular use in mind for it, but I felt it was one of the first RISC-V systems that were relatively capable (mentally I have it as somewhere between a Raspberry Pi 3 + a Pi 4). In particular it s a quad 1.5GHz 64-bit RISC-V core with 8G RAM, USB3, GigE ethernet and a single M.2 PCIe slot. More than ample as a personal machine for playing around with RISC-V and doing local builds. I ended up paying 67 for the Early Bird variant (dual GigE ethernet rather than 1 x 100Mb and 1 x GigE). A couple of weeks ago I got an email with a tracking number and last week it finally turned up. Being impatient the first thing I did was plug it into a monitor, connect up a keyboard, and power it on. Nothing except some flashing lights. Looking at the boot selector DIP switches suggested it was configured to boot from UART, so I flipped them to (what I thought was) the flash setting. It wasn t - turns out the ON marking on the switches represents logic 0 and it was correctly setup when I got it. I went to read the documentation which talked about writing an image to a MicroSD card, but also had details of the UART connection. Wanting to make sure the device was at least doing something before I actually tried an OS on it I hooked up a USB/serial dongle and powered the board up again. Success! U-Boot appeared and I could interact with it. I went to the VisionFive2 Debian page and proceeded to torrent the Image-69 image, writing it to a MicroSD card and inserting it in the slot on the bottom of the board. It booted fine. I can t even tell you what graphical environment it booted up because I don t remember; it worked fine though (at 1080p, I ve seen reports that 4K screens will make it croak). Poking around the image revealed that it s built off a snapshot.debian.org snapshot from 20220616T194833Z, which is a little dated at this point but I understand the rationale behind picking something that works and sticking with it. The kernel is of course a vendor special, based on 5.15.0. Further investigation revealed that the entire X/graphics stack is living in /usr/local, which isn t overly surprising; it s Imagination based. I was pleasantly surprised to discover there is work to upstream the Imagination support, but I m not planning to run the board with a monitor attached so it s not a high priority for me. Having discovered all that I decided to see how well a clean Debian unstable install from Debian Ports would go. I had a spare Intel Optane lying around (it s a stupid 22110 M.2 which is too long for any machine I own), so I put it in the slot on the bottom of the board. To my surprise it Just Worked and was detected ok:
# lspci
0000:00:00.0 PCI bridge: PLDA XpressRich-AXI Ref Design (rev 02)
0000:01:00.0 USB controller: VIA Technologies, Inc. VL805/806 xHCI USB 3.0 Controller (rev 01)
0001:00:00.0 PCI bridge: PLDA XpressRich-AXI Ref Design (rev 02)
0001:01:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [Optane]
I created a single partition with an ext4 filesystem (initially tried btrfs, but the StarFive kernel doesn t support it), and kicked off a debootstrap with:
# mkfs -t ext4 /dev/nvme0n1p1
# mount /dev/nvme0n1p1 /mnt
# debootstrap --keyring=/etc/apt/trusted.gpg.d/debian-ports-archive-2023.gpg \
	unstable /mnt https://deb.debian.org/debian-ports
The u-boot setup has a convoluted set of vendor scripts that eventually ends up reading a /boot/extlinux/extlinux.conf config from /dev/mmcblk1p2, so I added an additional entry there using the StarFive kernel but pointing to the NVMe device for /. Made sure to set a root password (not that I ve been bitten by that before, too many times), and rebooted. Success! Well. Sort of. I hit a bunch of problems with having a getty running on ttyS0 as well as one running on hvc0. The second turns out to be a console device from the RISC-V SBI. I did a systemctl mask serial-getty@hvc0.service which made things a bit happier, but I was still seeing odd behaviour and output. Turned out I needed to reboot the initramfs as well; the StarFive one was using Plymouth and doing some other stuff that seemed to be confusing matters. An update-initramfs -k 5.15.0-starfive -c built me a new one and everything was happy. Next problem; the StarFive kernel doesn t have IPv6 support. StarFive are good citizens and make their 5.15 kernel tree available, so I grabbed it, fed it the existing config, and tweaked some options (including adding IPV6 and SECCOMP, which chrony wanted). Slight hiccup when it turned out trying to do things like make sound modular caused it to fail to compile, and having to backport the fix that allowed the use of GCC 12 (as present in sid), but it got there. So I got cocky and tried to update it to the latest 5.15.94. A few manual merge fixups (which I may or may not have got right, but it compiles and boots for me), and success. Timings:
$ time make -j 4 bindeb-pkg
  [linux-image-5.15.94-00787-g1fbe8ac32aa8]
real	37m0.134s
user	117m27.392s
sys	6m49.804s
On the subject of kernels I am pleased to note that there are efforts to upstream the VisionFive 2 support, with what appears to be multiple members of StarFive engaging in multiple patch submission rounds. It s really great to see this and I look forward to being able to run an unmodified mainline kernel on my board. Niggles? I have a few. The provided u-boot doesn t have NVMe support enabled, so at present I need to keep a MicroSD card to boot off, even though root is on an SSD. I m also seeing some errors in dmesg from the SSD:
[155933.434038] nvme nvme0: I/O 436 QID 4 timeout, completion polled
[156173.351166] nvme nvme0: I/O 48 QID 3 timeout, completion polled
[156346.228993] nvme nvme0: I/O 108 QID 3 timeout, completion polled
It doesn t seem to cause any actual issues, and it could be the SSD, the 5.15 kernel or an actual hardware thing - I ll keep an eye on it (I will probably end up with a different SSD that actually fits, so that ll provide another data point). More annoying is the temperature the CPU seems to run at. There s no heatsink or fan, just the metal heatspreader on top of the CPU, and in normal idle operation it sits at around 60 C. Compiling a kernel it hit 90 C before I stopped the job and sorted out some additional cooling in the form of a desk fan, which kept it as just over 30 C. Bare VisionFive 2 SBC board with a small desk fan pointed at it I haven t seen any actual stability problems, but I wouldn t want to run for any length of time like that. I ve ordered a heatsink and also realised that the board supports a Raspberry Pi style PoE Hat , so I ve got one of those that includes a fan ordered (I am a complete convert to PoE especially for small systems like this). With the desk fan setup I ve been able to run the board for extended periods under load (I did a full recompile of the Debian 6.1.12-1 kernel package and it took about 10 hours). The M.2 slot is unfortunately only a single PCIe v2 lane, and my testing topped out at about 180MB/s. IIRC that is about half what the slot should be capable of, and less than a 10th of what the SSD can do. Ethernet testing with iPerf3 sustained about 941Mb/s, so basically maxing out the port. The board as a whole isn t going to set any speed records, but it s perfectly usable, and pretty impressive for the price point. On the Debian side I ve not hit any surprises. There s work going on to move RISC-V to a proper release architecture, and I m hoping to be able to help out with that, but the version of unstable I installed from the ports infrastructure has looked just like any other Debian install. Which is what you want. And that pretty much sums up my overall experience of the VisionFive 2; it s not noticeably different than any other single board computer. That s a good thing, FWIW, and once the kernel support lands properly upstream (it ll be post 6.3 at least it seems) it ll be a boring mainline supported platform that just happens to be RISC-V.

15 February 2023

Marco d'Itri: I replaced grub with systemd-boot

To be able to investigate and work on the the measured boot features I have switched from grub to systemd-boot (sd-boot). This initial step is optional, but it is useful because this way /etc/kernel/cmdline will become the new place where the kernel command line can be configured:
. /etc/default/grub
echo "root=/dev/mapper/root $GRUB_CMDLINE_LINUX $GRUB_CMDLINE_LINUX_DEFAULT" > /etc/kernel/cmdline
Do not forget to set the correct root file system there, because initramfs-tools does not support discovering it at boot time using the Discoverable Partitions Specification. The installation has been automated since systemd version 252.6-1, so installing the package has the effect of installing sd-boot in the ESP, enabling it in the UEFI boot sequence and then creating boot loader entries for the kernels already installed on the system:
apt install systemd-boot
If needed, it could be manually installed again just by running bootctl install. I like to show the boot menu by default, at least until I will be more familiar with sd-boot:
bootctl set-timeout 4
Since other UEFI binaries can be easily chainloaded, I am also going to keep around grub for a while, just to be sure:
cat <<END > /boot/efi/loader/entries/grub.conf
title Grub
linux /EFI/debian/grubx64.efi
END
At this point sd-boot works, but I still had to enable secure boot. So far sd-boot has not been signed with a Debian key known to the shim bootloader, so I needed to create a Machine Owner Key (MOK), enroll it in UEFI and then use it to sign everything. I dislike the complexity of mokutil and the other related programs, so after removing it and the boot shim I have decided to use sbctl instead. With it I easily created new keys, enrolled them in the EFI key store and then signed everything:
sbctl create-keys
sbctl enroll-keys
for file in /boot/efi/*/*/linux /boot/efi/EFI/*/*.efi; do
  sbctl sign -s $file
done
Since there is no sbctl package yet I need to make sure that also the kernels installed in the future will be automatically signed, so I have created a trivial script in /etc/kernel/install.d/ which automatically runs sbctl sign -s or sbctl remove-file. The Debian wiki SecureBoot page documents how do do this with mokutil and sbsigntool, but I think that sbctl is much friendlier. Since I am not using the boot shim, I also had to set DisableShimForSecureBoot=true in /etc/fwupd/uefi_capsule.conf to make firmware updates work automatically. As a bonus, I have also added to the boot menu the excellent Debian-based GRML live distribution. Since sd-boot is not capable of loopback-mounting CD-ROM images like grub, I first had to extract the kernel and initramfs and copy them to the ESP:
mount -o loop /boot/grml/grml64-full_2022.11.iso /mnt/
mkdir /boot/efi/grml/
cp /mnt/boot/grml64full/* /boot/efi/grml/
umount /mnt/
cat <<END > /boot/efi/loader/entries/grml.conf
title GRML
linux /grml/vmlinuz
initrd /grml/initrd.img
options boot=live bootid=grml64full202211 findiso=/grml/grml64-full_2022.11.iso live-media-path=/live/grml64-full net.ifnames=0 
END
As expected, after a reboot bootctl reports the new security features:
System:
      Firmware: UEFI 2.70 (Lenovo 0.4496)
 Firmware Arch: x64
   Secure Boot: enabled (user)
  TPM2 Support: yes
  Boot into FW: supported
Current Boot Loader:
      Product: systemd-boot 252.5-2
     Features:   Boot counting
                 Menu timeout control
                 One-shot menu timeout control
                 Default entry control
                 One-shot entry control
                 Support for XBOOTLDR partition
                 Support for passing random seed to OS
                 Load drop-in drivers
                 Support Type #1 sort-key field
                 Support @saved pseudo-entry
                 Support Type #1 devicetree field
                 Boot loader sets ESP information
          ESP: /dev/disk/by-partuuid/1b767f8e-70fa-5a48-b444-cfe5c272d66e
         File:  /EFI/systemd/systemd-bootx64.efi
...
Relevant documentation:

9 February 2023

Jonathan McDowell: Building a read-only Debian root setup: Part 2

This is the second part of how I build a read-only root setup for my router. You might want to read part 1 first, which covers the initial boot and general overview of how I tie the pieces together. This post will describe how I build the squashfs image that forms the main filesystem. Most of the build is driven from a script, make-router, which I ll dissect below. It s highly tailored to my needs, and this is a fairly lengthy post, but hopefully the steps I describe prove useful to anyone trying to do something similar.
Breakdown of make-router
#!/bin/bash
# Either rb3011 (arm) or rb5009 (arm64)
#HOSTNAME="rb3011"
HOSTNAME="rb5009"
if [ "x$ HOSTNAME " == "xrb3011" ]; then
	ARCH=armhf
elif [ "x$ HOSTNAME " == "xrb5009" ]; then
	ARCH=arm64
else
	echo "Unknown host: $ HOSTNAME "
	exit 1
fi

It s a bash script, and I allow building for either my RB3011 or RB5009, which means a different architecture (32 vs 64 bit). I run this script on my Pi 4 which means I don t have to mess about with QemuUserEmulation.
BASE_DIR=$(dirname $0)
IMAGE_FILE=$(mktemp --tmpdir router.$ ARCH .XXXXXXXXXX.img)
MOUNT_POINT=$(mktemp -p /mnt -d router.$ ARCH .XXXXXXXXXX)
# Build and mount an ext4 image file to put the root file system in
dd if=/dev/zero bs=1 count=0 seek=1G of=$ IMAGE_FILE 
mkfs -t ext4 $ IMAGE_FILE 
mount -o loop $ IMAGE_FILE  $ MOUNT_POINT 

I build the image in a loopback ext4 file on tmpfs (my Pi4 is the 8G model), which makes things a bit faster.
# Add dpkg excludes
mkdir -p $ MOUNT_POINT /etc/dpkg/dpkg.cfg.d/
cat <<EOF > $ MOUNT_POINT /etc/dpkg/dpkg.cfg.d/path-excludes
# Exclude docs
path-exclude=/usr/share/doc/*
# Only locale we want is English
path-exclude=/usr/share/locale/*
path-include=/usr/share/locale/en*/*
path-include=/usr/share/locale/locale.alias
# No man pages
path-exclude=/usr/share/man/*
EOF

Create a dpkg excludes config to drop docs, man pages and most locales before we even start the bootstrap.
# Setup fstab + mtab
echo "# Empty fstab as root is pre-mounted" > $ MOUNT_POINT /etc/fstab
ln -s ../proc/self/mounts $ MOUNT_POINT /etc/mtab
# Setup hostname
echo $ HOSTNAME  > $ MOUNT_POINT /etc/hostname
# Add the root SSH keys
mkdir -p $ MOUNT_POINT /root/.ssh/
cat <<EOF > $ MOUNT_POINT /root/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAv8NkUeVdsVdegS+JT9qwFwiHEgcC9sBwnv6RjpH6I4d3im4LOaPOatzneMTZlH8Gird+H4nzluciBr63hxmcFjZVW7dl6mxlNX2t/wKvV0loxtEmHMoI7VMCnrWD0PyvwJ8qqNu9cANoYriZRhRCsBi27qPNvI741zEpXN8QQs7D3sfe4GSft9yQplfJkSldN+2qJHvd0AHKxRdD+XTxv1Ot26+ZoF3MJ9MqtK+FS+fD9/ESLxMlOpHD7ltvCRol3u7YoaUo2HJ+u31l0uwPZTqkPNS9fkmeCYEE0oXlwvUTLIbMnLbc7NKiLgniG8XaT0RYHtOnoc2l2UnTvH5qsQ== noodles@earth.li
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDQb9+qFemcwKhey3+eTh5lxp+3sgZXW2HQQEZMt9hPvVXk+MiiNMx9WUzxPJnwXqlmmVdKsq+AvjA0i505Pp8fIj5DdUBpSqpLghmzpnGuob7SSwXYj+352hjD52UC4S0KMKbIaUpklADgsCbtzhYYc4WoO8F7kK63tS5qa1XSZwwRwPbYOWBcNocfr9oXCVWD9ismO8Y0l75G6EyW8UmwYAohDaV83pvJxQerYyYXBGZGY8FNjqVoOGMRBTUcLj/QTo0CDQvMtsEoWeCd0xKLZ3gjiH3UrknkaPra557/TWymQ8Oh15aPFTr5FvKgAlmZaaM0tP71SOGmx7GpCsP4jZD1Xj/7QMTAkLXb+Ou6yUOVM9J4qebdnmF2RGbf1bwo7xSIX6gAYaYgdnppuxqZX1wyAy+A2Hie4tUjMHKJ6OoFwBsV1sl+3FobrPn6IuulRCzsq2aLqLey+PHxuNAYdSKo7nIDB3qCCPwHlDK52WooSuuMidX4ujTUw7LDTia9FxAawudblxbrvfTbg3DsiDBAOAIdBV37HOAKu3VmvYSPyqT80DEy8KFmUpCEau59DID9VERkG6PWPVMiQnqgW2Agn1miOBZeIQV8PFjenAySxjzrNfb4VY/i/kK9nIhXn92CAu4nl6D+VUlw+IpQ8PZlWlvVxAtLonpjxr9OTw== noodles@yubikey
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC0I8UHj4IpfqUcGE4cTvLB0d2xmATSUzqtxW6ZhGbZxvQDKJesVW6HunrJ4NFTQuQJYgOXY/o82qBpkEKqaJMEFHTCjcaj3M6DIaxpiRfQfs0nhtzDB6zPiZn9Suxb0s5Qr4sTWd6iI9da72z3hp9QHNAu4vpa4MSNE+al3UfUisUf4l8TaBYKwQcduCE0z2n2FTi3QzmlkOgH4MgyqBBEaqx1tq7Zcln0P0TYZXFtrxVyoqBBIoIEqYxmFIQP887W50wQka95dBGqjtV+d8IbrQ4pB55qTxMd91L+F8n8A6nhQe7DckjS0Xdla52b9RXNXoobhtvx9K2prisagsHT noodles@cup
ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBK6iGog3WbNhrmrkglNjVO8/B6m7mN6q1tMm1sXjLxQa+F86ETTLiXNeFQVKCHYrk8f7hK0d2uxwgj6Ixy9k0Cw= noodles@sevai
EOF

Setup fstab, the hostname and SSH keys for root.
# Bootstrap our install
debootstrap \
	--arch=$ ARCH  \
	--include=collectd-core,conntrack,dnsmasq,ethtool,iperf3,kexec-tools,mosquitto,mtd-utils,mtr-tiny,ppp,tcpdump,rng-tools5,ssh,watchdog,wget \
	--exclude=dmidecode,isc-dhcp-client,isc-dhcp-common,makedev,nano \
	bullseye $ MOUNT_POINT  https://deb.debian.org/debian/

Actually do the debootstrap step, including a bunch of extra packages that we want.
# Install mqtt-arp
cp $ BASE_DIR /debs/mqtt-arp_1_$ ARCH .deb $ MOUNT_POINT /tmp
chroot $ MOUNT_POINT  dpkg -i /tmp/mqtt-arp_1_$ ARCH .deb
rm $ MOUNT_POINT /tmp/mqtt-arp_1_$ ARCH .deb
# Frob the mqtt-arp config so it starts after mosquitto
sed -i -e 's/After=.*/After=mosquitto.service/' $ MOUNT_POINT /lib/systemd/system/mqtt-arp.service

I haven t uploaded mqtt-arp to Debian, so I install a locally built package, and ensure it starts after mosquitto (the MQTT broker), given they re running on the same host.
# Frob watchdog so it starts earlier than multi-user
sed -i -e 's/After=.*/After=basic.target/' $ MOUNT_POINT /lib/systemd/system/watchdog.service
# Make sure the watchdog is poking the device file
sed -i -e 's/^#watchdog-device/watchdog-device/' $ MOUNT_POINT /etc/watchdog.conf

watchdog timeouts were particularly an issue on the RB3011, where the default timeout didn t give enough time to reach multiuser mode before it would reset the router. Not helpful, so alter the config to start it earlier (and make sure it s configured to actually kick the device file).
# Clean up docs + locales
rm -r $ MOUNT_POINT /usr/share/doc/*
rm -r $ MOUNT_POINT /usr/share/man/*
for dir in $ MOUNT_POINT /usr/share/locale/*/; do
	if [ "$ dir " != "$ MOUNT_POINT /usr/share/locale/en/" ]; then
		rm -r $ dir 
	fi
done

Clean up any docs etc that ended up installed.
# Set root password to root
echo "root:root"   chroot $ MOUNT_POINT  chpasswd

The only login method is ssh key to the root account though I suppose this allows for someone to execute a privilege escalation from a daemon user so I should probably randomise this. Does need to be known though so it s possible to login via the serial console for debugging.
# Add security to sources.list + update
echo "deb https://security.debian.org/debian-security bullseye-security main" >> $ MOUNT_POINT /etc/apt/sources.list
chroot $ MOUNT_POINT  apt update
chroot $ MOUNT_POINT  apt -y full-upgrade
chroot $ MOUNT_POINT  apt clean
# Cleanup the APT lists
rm $ MOUNT_POINT /var/lib/apt/lists/www.*
rm $ MOUNT_POINT /var/lib/apt/lists/security.*

Pull in any security updates, then clean out the APT lists rather than polluting the image with them.
# Disable the daily APT timer
rm $ MOUNT_POINT /etc/systemd/system/timers.target.wants/apt-daily.timer
# Disable daily dpkg backup
cat <<EOF > $ MOUNT_POINT /etc/cron.daily/dpkg
#!/bin/sh
# Don't do the daily dpkg backup
exit 0
EOF
# We don't want a persistent systemd journal
rmdir $ MOUNT_POINT /var/log/journal

None of these make sense on a router.
# Enable nftables
ln -s /lib/systemd/system/nftables.service \
	$ MOUNT_POINT /etc/systemd/system/sysinit.target.wants/nftables.service

Ensure we have firewalling enabled automatically.
# Add systemd-coredump + systemd-timesync user / group
echo "systemd-timesync:x:998:" >> $ MOUNT_POINT /etc/group
echo "systemd-coredump:x:999:" >> $ MOUNT_POINT /etc/group
echo "systemd-timesync:!*::" >> $ MOUNT_POINT /etc/gshadow
echo "systemd-coredump:!*::" >> $ MOUNT_POINT /etc/gshadow
echo "systemd-timesync:x:998:998:systemd Time Synchronization:/:/usr/sbin/nologin" >> $ MOUNT_POINT /etc/passwd
echo "systemd-coredump:x:999:999:systemd Core Dumper:/:/usr/sbin/nologin" >> $ MOUNT_POINT /etc/passwd
echo "systemd-timesync:!*:47358::::::" >> $ MOUNT_POINT /etc/shadow
echo "systemd-coredump:!*:47358::::::" >> $ MOUNT_POINT /etc/shadow
# Create /etc/.pwd.lock, otherwise it'll end up in the overlay
touch $ MOUNT_POINT /etc/.pwd.lock
chmod 600 $ MOUNT_POINT /etc/.pwd.lock

Create a number of users that will otherwise get created at boot, and a lock file that will otherwise get created anyway.
# Copy config files
cp --recursive --preserve=mode,timestamps $ BASE_DIR /etc/* $ MOUNT_POINT /etc/
cp --recursive --preserve=mode,timestamps $ BASE_DIR /etc-$ ARCH /* $ MOUNT_POINT /etc/
chroot $ MOUNT_POINT  chown mosquitto /etc/mosquitto/mosquitto.users
chroot $ MOUNT_POINT  chown mosquitto /etc/ssl/mqtt.home.key

There are config files that are easier to replace wholesale, some of which are specific to the hardware (e.g. related to network interfaces). See below for some more details.
# Build symlinks into flash for boot / modules
ln -s /mnt/flash/lib/modules $ MOUNT_POINT /lib/modules
rmdir $ MOUNT_POINT /boot
ln -s /mnt/flash/boot $ MOUNT_POINT /boot

The kernel + its modules live outside the squashfs image, on the USB flash drive that the image lives on. That makes for easier kernel upgrades.
# Put our git revision into os-release
echo -n "GIT_VERSION=" >> $ MOUNT_POINT /etc/os-release
(cd $ BASE_DIR  ; git describe --tags) >> $ MOUNT_POINT /etc/os-release

Always helpful to be able to check the image itself for what it was built from.
# Add some stuff to root's .bashrc
cat << EOF >> $ MOUNT_POINT /root/.bashrc
alias ls='ls -F --color=auto'
eval "\$(dircolors)"
case "\$TERM" in
xterm* rxvt*)
	PS1="\\[\\e]0;\\u@\\h: \\w\a\\]\$PS1"
	;;
*)
	;;
esac
EOF

Just some niceties for when I do end up logging in.
# Build the squashfs
mksquashfs $ MOUNT_POINT  /tmp/router.$ ARCH .squashfs \
	-comp xz

Actually build the squashfs image.
# Save the installed package list off
chroot $ MOUNT_POINT  dpkg --get-selections > /tmp/wip-installed-packages

Save off the installed package list. This was particularly useful when trying to replicate the existing router setup and making sure I had all the important packages installed. It doesn t really serve a purpose now.
In terms of the config files I copy into /etc, shared across both routers are the following:
Breakdown of shared config
  • apt config (disable recommends, periodic updates):
    • apt/apt.conf.d/10periodic, apt/apt.conf.d/local-recommends
  • Adding a default, empty, locale:
    • default/locale
  • DNS/DHCP:
    • dnsmasq.conf, dnsmasq.d/dhcp-ranges, dnsmasq.d/static-ips
    • hosts, resolv.conf
  • Enabling IP forwarding:
    • sysctl.conf
  • Logs related:
    • logrotate.conf, rsyslog.conf
  • MQTT related:
    • mosquitto/mosquitto.users, mosquitto/conf.d/ssl.conf, mosquitto/conf.d/users.conf, mosquitto/mosquitto.acl, mosquitto/mosquitto.conf
    • mqtt-arp.conf
    • ssl/lets-encrypt-r3.crt, ssl/mqtt.home.key, ssl/mqtt.home.crt
  • PPP configuration:
    • ppp/ip-up.d/0000usepeerdns, ppp/ipv6-up.d/defaultroute, ppp/pap-secrets, ppp/chap-secrets
    • network/interfaces.d/pppoe-wan
The router specific config is mostly related to networking:
Breakdown of router specific config
  • Firewalling:
    • nftables.conf
  • Interfaces:
    • dnsmasq.d/interfaces
    • network/interfaces.d/eth0, network/interfaces.d/p1, network/interfaces.d/p2, network/interfaces.d/p7, network/interfaces.d/p8
  • PPP config (network interface piece):
    • ppp/peers/aquiss
  • SSH keys:
    • ssh/ssh_host_ecdsa_key, ssh/ssh_host_ed25519_key, ssh/ssh_host_rsa_key, ssh/ssh_host_ecdsa_key.pub, ssh/ssh_host_ed25519_key.pub, ssh/ssh_host_rsa_key.pub
  • Monitoring:
    • collectd/collectd.conf, collectd/collectd.conf.d/network.conf

6 February 2023

Vincent Bernat: Fast and dynamic encoding of Protocol Buffers in Go

Protocol Buffers are a popular choice for serializing structured data due to their compact size, fast processing speed, language independence, and compatibility. There exist other alternatives, including Cap n Proto, CBOR, and Avro. Usually, data structures are described in a proto definition file (.proto). The protoc compiler and a language-specific plugin convert it into code:
$ head flow-4.proto
syntax = "proto3";
package decoder;
option go_package = "akvorado/inlet/flow/decoder";
message FlowMessagev4  
  uint64 TimeReceived = 2;
  uint32 SequenceNum = 3;
  uint64 SamplingRate = 4;
  uint32 FlowDirection = 5;
$ protoc -I=. --plugin=protoc-gen-go --go_out=module=akvorado:. flow-4.proto
$ head inlet/flow/decoder/flow-4.pb.go
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
//      protoc-gen-go v1.28.0
//      protoc        v3.21.12
// source: inlet/flow/data/schemas/flow-4.proto
package decoder
import (
        protoreflect "google.golang.org/protobuf/reflect/protoreflect"
Akvorado collects network flows using IPFIX or sFlow, decodes them with GoFlow2, encodes them to Protocol Buffers, and sends them to Kafka to be stored in a ClickHouse database. Collecting a new field, such as source and destination MAC addresses, requires modifications in multiple places, including the proto definition file and the ClickHouse migration code. Moreover, the cost is paid by all users.1 It would be nice to have an application-wide schema and let users enable or disable the fields they need. While the main goal is flexibility, we do not want to sacrifice performance. On this front, this is quite a success: when upgrading from 1.6.4 to 1.7.1, the decoding and encoding performance almost doubled!
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
                              initial.txt                 final.txt               
                                 sec/op        sec/op     vs base                 
Netflow/with_encoding-12      12.963    2%   7.836    1%  -39.55% (p=0.000 n=10)
Sflow/with_encoding-12         19.37    1%   10.15    2%  -47.63% (p=0.000 n=10)

Faster Protocol Buffers encoding I use the following code to benchmark both the decoding and encoding process. Initially, the Decode() method is a thin layer above GoFlow2 producer and stores the decoded data into the in-memory structure generated by protoc. Later, some of the data will be encoded directly during flow decoding. This is why we measure both the decoding and the encoding.2
func BenchmarkDecodeEncodeSflow(b *testing.B)  
    r := reporter.NewMock(b)
    sdecoder := sflow.New(r)
    data := helpers.ReadPcapPayload(b,
        filepath.Join("decoder", "sflow", "testdata", "data-1140.pcap"))
    for _, withEncoding := range []bool true, false   
        title := map[bool]string 
            true:  "with encoding",
            false: "without encoding",
         [withEncoding]
        var got []*decoder.FlowMessage
        b.Run(title, func(b *testing.B)  
            for i := 0; i < b.N; i++  
                got = sdecoder.Decode(decoder.RawFlow 
                    Payload: data,
                    Source: net.ParseIP("127.0.0.1"),
                 )
                if withEncoding  
                    for _, flow := range got  
                        buf := []byte 
                        buf = protowire.AppendVarint(buf, uint64(proto.Size(flow)))
                        proto.MarshalOptions .MarshalAppend(buf, flow)
                     
                 
             
         )
     
 
The canonical Go implementation for Protocol Buffers, google.golang.org/protobuf is not the most efficient one. For a long time, people were relying on gogoprotobuf. However, the project is now deprecated. A good replacement is vtprotobuf.3
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
                              initial.txt               bench-2.txt              
                                sec/op        sec/op     vs base                 
Netflow/with_encoding-12      12.96    2%   10.28    2%  -20.67% (p=0.000 n=10)
Netflow/without_encoding-12   8.935    2%   8.975    2%        ~ (p=0.143 n=10)
Sflow/with_encoding-12        19.37    1%   16.67    2%  -13.93% (p=0.000 n=10)
Sflow/without_encoding-12     14.62    3%   14.87    1%   +1.66% (p=0.007 n=10)

Dynamic Protocol Buffers encoding We have our baseline. Let s see how to encode our Protocol Buffers without a .proto file. The wire format is simple and rely a lot on variable-width integers. Variable-width integers, or varints, are an efficient way of encoding unsigned integers using a variable number of bytes, from one to ten, with small values using fewer bytes. They work by splitting integers into 7-bit payloads and using the 8th bit as a continuation indicator, set to 1 for all payloads except the last.
Variable-width integers encoding in Protocol Buffers: conversion of 150 to a varint
Variable-width integers encoding in Protocol Buffers
For our usage, we only need two types: variable-width integers and byte sequences. A byte sequence is encoded by prefixing it by its length as a varint. When a message is encoded, each key-value pair is turned into a record consisting of a field number, a wire type, and a payload. The field number and the wire type are encoded as a single variable-width integer called a tag.
Message encoded with Protocol Buffers: three varints, two sequences of bytes
Message encoded with Protocol Buffers
We use the following low-level functions to build the output buffer: Our schema abstraction contains the appropriate information to encode a message (ProtobufIndex) and to generate a proto definition file (fields starting with Protobuf):
type Column struct  
    Key       ColumnKey
    Name      string
    Disabled  bool
    // [ ]
    // For protobuf.
    ProtobufIndex    protowire.Number
    ProtobufType     protoreflect.Kind // Uint64Kind, Uint32Kind,  
    ProtobufEnum     map[int]string
    ProtobufEnumName string
    ProtobufRepeated bool
 
We have a few helper methods around the protowire functions to directly encode the fields while decoding the flows. They skip disabled fields or non-repeated fields already encoded. Here is an excerpt of the sFlow decoder:
sch.ProtobufAppendVarint(bf, schema.ColumnBytes, uint64(recordData.Base.Length))
sch.ProtobufAppendVarint(bf, schema.ColumnProto, uint64(recordData.Base.Protocol))
sch.ProtobufAppendVarint(bf, schema.ColumnSrcPort, uint64(recordData.Base.SrcPort))
sch.ProtobufAppendVarint(bf, schema.ColumnDstPort, uint64(recordData.Base.DstPort))
sch.ProtobufAppendVarint(bf, schema.ColumnEType, helpers.ETypeIPv4)
For fields that are required later in the pipeline, like source and destination addresses, they are stored unencoded in a separate structure:
type FlowMessage struct  
    TimeReceived uint64
    SamplingRate uint32
    // For exporter classifier
    ExporterAddress netip.Addr
    // For interface classifier
    InIf  uint32
    OutIf uint32
    // For geolocation or BMP
    SrcAddr netip.Addr
    DstAddr netip.Addr
    NextHop netip.Addr
    // Core component may override them
    SrcAS     uint32
    DstAS     uint32
    GotASPath bool
    // protobuf is the protobuf representation for the information not contained above.
    protobuf      []byte
    protobufSet   bitset.BitSet
 
The protobuf slice holds encoded data. It is initialized with a capacity of 500 bytes to avoid resizing during encoding. There is also some reserved room at the beginning to be able to encode the total size as a variable-width integer. Upon finalizing encoding, the remaining fields are added and the message length is prefixed:
func (schema *Schema) ProtobufMarshal(bf *FlowMessage) []byte  
    schema.ProtobufAppendVarint(bf, ColumnTimeReceived, bf.TimeReceived)
    schema.ProtobufAppendVarint(bf, ColumnSamplingRate, uint64(bf.SamplingRate))
    schema.ProtobufAppendIP(bf, ColumnExporterAddress, bf.ExporterAddress)
    schema.ProtobufAppendVarint(bf, ColumnSrcAS, uint64(bf.SrcAS))
    schema.ProtobufAppendVarint(bf, ColumnDstAS, uint64(bf.DstAS))
    schema.ProtobufAppendIP(bf, ColumnSrcAddr, bf.SrcAddr)
    schema.ProtobufAppendIP(bf, ColumnDstAddr, bf.DstAddr)
    // Add length and move it as a prefix
    end := len(bf.protobuf)
    payloadLen := end - maxSizeVarint
    bf.protobuf = protowire.AppendVarint(bf.protobuf, uint64(payloadLen))
    sizeLen := len(bf.protobuf) - end
    result := bf.protobuf[maxSizeVarint-sizeLen : end]
    copy(result, bf.protobuf[end:end+sizeLen])
    return result
 
Minimizing allocations is critical for maintaining encoding performance. The benchmark tests should be run with the -benchmem flag to monitor allocation numbers. Each allocation incurs an additional cost to the garbage collector. The Go profiler is a valuable tool for identifying areas of code that can be optimized:
$ go test -run=__nothing__ -bench=Netflow/with_encoding \
>         -benchmem -cpuprofile profile.out \
>         akvorado/inlet/flow
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
Netflow/with_encoding-12             143953              7955 ns/op            8256 B/op        134 allocs/op
PASS
ok      akvorado/inlet/flow     1.418s
$ go tool pprof profile.out
File: flow.test
Type: cpu
Time: Feb 4, 2023 at 8:12pm (CET)
Duration: 1.41s, Total samples = 2.08s (147.96%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) web
After using the internal schema instead of code generated from the proto definition file, the performance improved. However, this comparison is not entirely fair as less information is being decoded and previously GoFlow2 was decoding to its own structure, which was then copied to our own version.
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
                              bench-2.txt                bench-3.txt              
                                 sec/op        sec/op     vs base                 
Netflow/with_encoding-12      10.284    2%   7.758    3%  -24.56% (p=0.000 n=10)
Netflow/without_encoding-12    8.975    2%   7.304    2%  -18.61% (p=0.000 n=10)
Sflow/with_encoding-12         16.67    2%   14.26    1%  -14.50% (p=0.000 n=10)
Sflow/without_encoding-12      14.87    1%   13.56    2%   -8.80% (p=0.000 n=10)
As for testing, we use github.com/jhump/protoreflect: the protoparse package parses the proto definition file we generate and the dynamic package decodes the messages. Check the ProtobufDecode() method for more details.4 To get the final figures, I have also optimized the decoding in GoFlow2. It was relying heavily on binary.Read(). This function may use reflection in certain cases and each call allocates a byte array to read data. Replacing it with a more efficient version provides the following improvement:
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
                              bench-3.txt                bench-4.txt              
                                 sec/op        sec/op     vs base                 
Netflow/with_encoding-12       7.758    3%   7.365    2%   -5.07% (p=0.000 n=10)
Netflow/without_encoding-12    7.304    2%   6.931    3%   -5.11% (p=0.000 n=10)
Sflow/with_encoding-12        14.256    1%   9.834    2%  -31.02% (p=0.000 n=10)
Sflow/without_encoding-12     13.559    2%   9.353    2%  -31.02% (p=0.000 n=10)
It is now easier to collect new data and the inlet component is faster!

Notice Some paragraphs were editorialized by ChatGPT, using editorialize and keep it short as a prompt. The result was proofread by a human for correctness. The main idea is that ChatGPT should be better at English than me.


  1. While empty fields are not serialized to Protocol Buffers, empty columns in ClickHouse take some space, even if they compress well. Moreover, unused fields are still decoded and they may clutter the interface.
  2. There is a similar function using NetFlow. NetFlow and IPFIX protocols are less complex to decode than sFlow as they are using a simpler TLV structure.
  3. vtprotobuf generates more optimized Go code by removing an abstraction layer. It directly generates the code encoding each field to bytes:
    if m.OutIfSpeed != 0  
        i = encodeVarint(dAtA, i, uint64(m.OutIfSpeed))
        i--
        dAtA[i] = 0x6
        i--
        dAtA[i] = 0xd8
     
    
  4. There is also a protoprint package to generate proto definition file. I did not use it.

Reproducible Builds: Reproducible Builds in January 2023

Welcome to the first report for 2023 from the Reproducible Builds project! In these reports we try and outline the most important things that we have been up to over the past month, as well as the most important things in/around the community. As a quick recap, the motivation behind the reproducible builds effort is to ensure no malicious flaws can be deliberately introduced during compilation and distribution of the software that we run on our devices. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.


News In a curious turn of events, GitHub first announced this month that the checksums of various Git archives may be subject to change, specifically that because:
the default compression for Git archives has recently changed. As result, archives downloaded from GitHub may have different checksums even though the contents are completely unchanged.
This change (which was brought up on our mailing list last October) would have had quite wide-ranging implications for anyone wishing to validate and verify downloaded archives using cryptographic signatures. However, GitHub reversed this decision, updating their original announcement with a message that We are reverting this change for now. More details to follow. It appears that this was informed in part by an in-depth discussion in the GitHub Community issue tracker.
The Bundesamt f r Sicherheit in der Informationstechnik (BSI) (trans: The Federal Office for Information Security ) is the agency in charge of managing computer and communication security for the German federal government. They recently produced a report that touches on attacks on software supply-chains (Supply-Chain-Angriff). (German PDF)
Contributor Seb35 updated our website to fix broken links to Tails Git repository [ ][ ], and Holger updated a large number of pages around our recent summit in Venice [ ][ ][ ][ ].
Noak J nsson has written an interesting paper entitled The State of Software Diversity in the Software Supply Chain of Ethereum Clients. As the paper outlines:
In this report, the software supply chains of the most popular Ethereum clients are cataloged and analyzed. The dependency graphs of Ethereum clients developed in Go, Rust, and Java, are studied. These client are Geth, Prysm, OpenEthereum, Lighthouse, Besu, and Teku. To do so, their dependency graphs are transformed into a unified format. Quantitative metrics are used to depict the software supply chain of the blockchain. The results show a clear difference in the size of the software supply chain required for the execution layer and consensus layer of Ethereum.

Yongkui Han posted to our mailing list discussing making reproducible builds & GitBOM work together without gitBOM-ID embedding. GitBOM (now renamed to OmniBOR) is a project to enable automatic, verifiable artifact resolution across today s diverse software supply-chains [ ]. In addition, Fabian Keil wrote to us asking whether anyone in the community would be at Chemnitz Linux Days 2023, which is due to take place on 11th and 12th March (event info). Separate to this, Akihiro Suda posted to our mailing list just after the end of the month with a status report of bit-for-bit reproducible Docker/OCI images. As Akihiro mentions in their post, they will be giving a talk at FOSDEM in the Containers devroom titled Bit-for-bit reproducible builds with Dockerfile and that my talk will also mention how to pin the apt/dnf/apk/pacman packages with my repro-get tool.
The extremely popular Signal messenger app added upstream support for the SOURCE_DATE_EPOCH environment variable this month. This means that release tarballs of the Signal desktop client do not embed nondeterministic release information. [ ][ ]

Distribution work

F-Droid & Android There was a very large number of changes in the F-Droid and wider Android ecosystem this month: On January 15th, a blog post entitled Towards a reproducible F-Droid was published on the F-Droid website, outlining the reasons why F-Droid signs published APKs with its own keys and how reproducible builds allow using upstream developers keys instead. In particular:
In response to [ ] criticisms, we started encouraging new apps to enable reproducible builds. It turns out that reproducible builds are not so difficult to achieve for many apps. In the past few months we ve gotten many more reproducible apps in F-Droid than before. Currently we can t highlight which apps are reproducible in the client, so maybe you haven t noticed that there are many new apps signed with upstream developers keys.
(There was a discussion about this post on Hacker News.) In addition:
  • F-Droid added 13 apps published with reproducible builds this month. [ ]
  • FC Stegerman outlined a bug where baseline.profm files are nondeterministic, developed a workaround, and provided all the details required for a fix. As they note, this issue has now been fixed but the fix is not yet part of an official Android Gradle plugin release.
  • GitLab user Parwor discovered that the number of CPU cores can affect the reproducibility of .dex files. [ ]
  • FC Stegerman also announced the 0.2.0 and 0.2.1 releases of reproducible-apk-tools, a suite of tools to help make .apk files reproducible. Several new subcommands and scripts were added, and a number of bugs were fixed as well [ ][ ]. They also updated the F-Droid website to improve the reproducibility-related documentation. [ ][ ]
  • On the F-Droid issue tracker, FC Stegerman discussed reproducible builds with one of the developers of the Threema messenger app and reported that Android SDK build-tools 31.0.0 and 32.0.0 (unlike earlier and later versions) have a zipalign command that produces incorrect padding.
  • A number of bugs related to reproducibility were discovered in Android itself. Firstly, the non-deterministic order of .zip entries in .apk files [ ] and then newline differences between building on Windows versus Linux that can make builds not reproducible as well. [ ] (Note that these links may require a Google account to view.)
  • And just before the end of the month, FC Stegerman started a thread on our mailing list on the topic of hiding data/code in APK embedded signatures which has been made possible by the Android APK Signature Scheme v2/v3. As part of this, they made an Android app that reads the APK Signing block of its own APK and extracts a payload in order to alter its behaviour called sigblock-code-poc.

Debian As mentioned in last month s report, Vagrant Cascadian has been organising a series of online sprints in order to clear the huge backlog of reproducible builds patches submitted by performing NMUs (Non-Maintainer Uploads). During January, a sprint took place on the 10th, resulting in the following uploads: During this sprint, Holger Levsen filed Debian bug #1028615 to request that the tracker.debian.org service display results of reproducible rebuilds, not just reproducible CI results. Elsewhere in Debian, strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month, version 1.13.1-1 was uploaded to Debian unstable by Holger Levsen, including a fix by FC Stegerman (obfusk) to update a regular expression for the latest version of file(1) [ ]. (#1028892) Lastly, 65 reviews of Debian packages were added, 21 were updated and 35 were removed this month adding to our knowledge about identified issues.

Other distributions In other distributions:

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made the following changes to diffoscope, including preparing and uploading versions 231, 232, 233 and 234 to Debian:
  • No need for from __future__ import print_function import anymore. [ ]
  • Comment and tidy the extras_require.json handling. [ ]
  • Split inline Python code to generate test Recommends into a separate Python script. [ ]
  • Update debian/tests/control after merging support for PyPDF support. [ ]
  • Correctly catch segfaulting cd-iccdump binary. [ ]
  • Drop some old debugging code. [ ]
  • Allow ICC tests to (temporarily) fail. [ ]
In addition, FC Stegerman (obfusk) made a number of changes, including:
  • Updating the test_text_proper_indentation test to support the latest version(s) of file(1). [ ]
  • Use an extras_require.json file to store some build/release metadata, instead of accessing the internet. [ ]
  • Updating an APK-related file(1) regular expression. [ ]
  • On the diffoscope.org website, de-duplicate contributors by e-mail. [ ]
Lastly, Sam James added support for PyPDF version 3 [ ] and Vagrant Cascadian updated a handful of tool references for GNU Guix. [ ][ ]

Upstream patches The Reproducible Builds project attempts to fix as many currently-unreproducible packages as possible. This month, we wrote a large number of such patches, including:

Testing framework The Reproducible Builds project operates a comprehensive testing framework at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In January, the following changes were made by Holger Levsen:
  • Node changes:
  • Debian-related changes:
    • Only keep diffoscope s HTML output (ie. no .json or .txt) for LTS suites and older in order to save diskspace on the Jenkins host. [ ]
    • Re-create pbuilder base less frequently for the stretch, bookworm and experimental suites. [ ]
  • OpenWrt-related changes:
    • Add gcc-multilib to OPENWRT_HOST_PACKAGES and install it on the nodes that need it. [ ]
    • Detect more problems in the health check when failing to build OpenWrt. [ ]
  • Misc changes:
    • Update the chroot-run script to correctly manage /dev and /dev/pts. [ ][ ][ ]
    • Update the Jenkins shell monitor script to collect disk stats less frequently [ ] and to include various directory stats. [ ][ ]
    • Update the real year in the configuration in order to be able to detect whether a node is running in the future or not. [ ]
    • Bump copyright years in the default page footer. [ ]
In addition, Christian Marangi submitted a patch to build OpenWrt packages with the V=s flag to enable debugging. [ ]
If you are interested in contributing to the Reproducible Builds project, please visit the Contribute page on our website. You can get in touch with us via:

28 January 2023

Craig Small: Fixing iCalendar feeds

The local government here has all the schools use an iCalendar feed for things like when school terms start and stop and other school events occur. The department s website also has events like public holidays. The issue is that all of them don t make it an all-day event but one that happens at midnight, or one past midnight. The events synchronise fine, though Google s calendar is known for synchronising when it feels like it, not at any particular time you would like it to.
Screenshot of Android Calendar showing a tiny bar at midnight which is the event.
Even though a public holiday is all day, they are sent as appointments for midnight. That means on my phone all the events are these tiny bars that appear right up the top of the screen and are easily missed, especially when the focus of the calendar is during the day. On the phone, you can see the tiny purple bar at midnight. This is how the events appear. It s not the calendar s fault, as far as it knows the school events are happening at midnight. You can also see Lunar New Year and Australia Day appear in the all-day part of the calendar and don t scroll away. That s where these events should be.
Why are all the events appearing at midnight? The reason is the feed is incorrectly set up and has the time. The events are sent in an iCalendar format and a typical event looks like this:
BEGIN:VEVENT
DTSTART;TZID=Australia/Sydney:20230206T000000
DTEND;TZID=Australia/Sydney:20230206T000000
SUMMARY:School Term starts
END:VEVENT
The event starting and stopping date and time are the DTSTART and DTEND lines. Both of them have the date of 2023/02/06 or 6th February 2023 and a time of 00:00:00 or midnight. So the calendar is doing the right thing, we need to fix the feed! The Fix I wrote a quick and dirty PHP script to download the feed from the real site, change the DTSTART and DTEND lines to all-day events and leave the rest of it alone.
<?php
$site = $_GET['s'];
if ($site == 'site1')  
    $REMOTE_URL='https://site1.example.net/ical_feed';
  elseif ($site == 'site2')  
    $REMOTE_URL='https://site2.example.net/ical_feed';
  else  
    http_response_code(400);
    die();
 
$fp = fopen($REMOTE_URL, "r");
if (!$fp)  
    die("fopen");
 
header('Content-Type: text/calendar');
while (( $line = fgets($fp, 1024)) !== false)  
    $line = preg_replace(
        '/^(DTSTART DTEND);[^:]+:([0-9] 8 )T000[01]00/',
        '$ 1 ;VALUE=DATE:$ 2 ',
        $line);
    echo $line;
 
?>
It s pretty quick and nasty but gets the job done. So what is it doing?
  • Lines 2-10: Check the given variable s and match it to either site1 or site2 to obtain the URL. If you only had one site to fix you could just set the REMOTE_URL variable.
  • Lines 12-15: A typical fopen() and nasty error handling.
  • Line 16: set the content type to a calendar.
  • Line 17: A while loop to read the contents of the remote site line by line.
  • Line 18-21: This is where the magic happens, preg_replace is a Perl regular expression replacement. The PCRE is:
    • Finding lines starting with DTSTART or DTEND and store it in capture 1
    • Skip everything that isn t a colon. This is the timezone information. I wasn t sure if it was needed and how to combine it so I took it out. All the all-day events I saw don t have a time zone.
    • Find 8 numerics (this is for YYYYMMDD) and store it in capture 2.
    • Scan the Time part, a literal T then HHMMSS. Some sites use midnight some use one minute past, so it covers both.
    • Replace the line with either DTSTART or DTEND (capture 1), set the value type to DATE as the default is date/time and print the date (capture 2).
  • Line 22: Print either the modified or original line.
You need to save the script on your web server somewhere, possibly with an alias command. The whole point of this is to change the type from a date/time to a date-only event and only print the date part of it for the start and end of it. The resulting iCalendar event looks like this:
BEGIN:VEVENT
DTSTART;VALUE=DATE:20230206
DTEND;VALUE=DATE:20230206
SUMMARY:School Term starts
END:VEVENT
The calendar then shows it properly as an all-day event. I would check the script works before doing the next step. You can use things like curl or wget to download it. If you use a normal browser, it will probably just download the translated file. If you re not seeing the right thing then it s probably the PCRE failing. You can check it online with a regex checker such as https://regex101.com. The site has saved my PCRE and match so you got something to start with. Calendar settings The last thing to do is to change the URL in your calendar settings. Each calendar system has a different way of doing it. For Google Calendar they provide instructions and you want to follow the section titled Use a link to add a public Calendar . The URL here is not the actual site s URL (which you would have put into the REMOTE_URL variable before) but the URL of your script plus the ?s=site1 part. So if you put your script aliased to /myical.php and the site ID was site1 and your website is www.example.com the URL would be https://www.example.com/myical.php?s=site1 . You should then see the events appear as all-day events on your calendar.

10 January 2023

Matthew Garrett: Integrating Linux with Okta Device Trust

I've written about bearer tokens and how much pain they cause me before, but sadly wishing for a better world doesn't make it happen so I'm making do with what's available. Okta has a feature called Device Trust which allows to you configure access control policies that prevent people obtaining tokens unless they're using a trusted device. This doesn't actually bind the tokens to the hardware in any way, so if a device is compromised or if a user is untrustworthy this doesn't prevent the token ending up on an unmonitored system with no security policies. But it's an incremental improvement, other than the fact that for desktop it's only supported on Windows and MacOS, which really doesn't line up well with my interests.

Obviously there's nothing fundamentally magic about these platforms, so it seemed fairly likely that it would be possible to make this work elsewhere. I spent a while staring at the implementation using Charles Proxy and the Chrome developer tools network tab and had worked out a lot, and then Okta published a paper describing a lot of what I'd just laboriously figured out. But it did also help clear up some points of confusion and clarified some design choices. I'm not going to give a full description of the details (with luck there'll be code shared for that before too long), but here's an outline of how all of this works. Also, to be clear, I'm only going to talk about the desktop support here - mobile is a bunch of related but distinct things that I haven't looked at in detail yet.

Okta's Device Trust (as officially supported) relies on Okta Verify, a local agent. When initially installed, Verify authenticates as the user, obtains a token with a scope that allows it to manage devices, and then registers the user's computer as an additional MFA factor. This involves it generating a JWT that embeds a number of custom claims about the device and its state, including things like the serial number. This JWT is signed with a locally generated (and hardware-backed, using a TPM or Secure Enclave) key, which allows Okta to determine that any future updates from a device claiming the same identity are genuinely from the same device (you could construct an update with a spoofed serial number, but you can't copy the key out of a TPM so you can't sign it appropriately). This is sufficient to get a device registered with Okta, at which point it can be used with Fastpass, Okta's hardware-backed MFA mechanism.

As outlined in the aforementioned deep dive paper, Fastpass is implemented via multiple mechanisms. I'm going to focus on the loopback one, since it's the one that has the strongest security properties. In this mode, Verify listens on one of a list of 10 or so ports on localhost. When you hit the Okta signin widget, choosing Fastpass triggers the widget into hitting each of these ports in turn until it finds one that speaks Fastpass and then submits a challenge to it (along with the URL that's making the request). Verify then constructs a response that includes the challenge and signs it with the hardware-backed key, along with information about whether this was done automatically or whether it included forcing the user to prove their presence. Verify then submits this back to Okta, and if that checks out Okta completes the authentication.

Doing this via loopback from the browser has a bunch of nice properties, primarily around the browser providing information about which site triggered the request. This means the Verify agent can make a decision about whether to submit something there (ie, if a fake login widget requests your creds, the agent will ignore it), and also allows the issued token to be cross-checked against the site that requested it (eg, if g1thub.com requests a token that's valid for github.com, that's a red flag). It's not quite at the same level as a hardware WebAuthn token, but it has many of the anti-phishing properties.

But none of this actually validates the device identity! The entire registration process is up to the client, and clients are in a position to lie. Someone could simply reimplement Verify to lie about, say, a device serial number when registering, and there'd be no proof to the contrary. Thankfully there's another level to this to provide stronger assurances. Okta allows you to provide a CA root[1]. When Okta issues a Fastpass challenge to a device the challenge includes a list of the trusted CAs. If a client has a certificate that chains back to that, it can embed an additional JWT in the auth JWT, this one containing the certificate and signed with the certificate's private key. This binds the CA-issued identity to the Fastpass validation, and causes the device to start appearing as "Managed" in the Okta device management UI. At that point you can configure policy to restrict various apps to managed devices, ensuring that users are only able to get tokens if they're using a device you've previously issued a certificate to.

I've managed to get Linux tooling working with this, though there's still a few drawbacks. The main issue is that the API only allows you to register devices that declare themselves as Windows or MacOS, followed by the login system sniffing browser user agent and only offering Fastpass if you're on one of the officially supported platforms. This can be worked around with an extension that spoofs user agent specifically on the login page, but that's still going to result in devices being logged as a non-Linux OS which makes interpreting the logs more difficult. There's also no ability to choose which bits of device state you log: there's a couple of existing integrations, and otherwise a fixed set of parameters that are reported. It'd be lovely to be able to log arbitrary material and make policy decisions based on that.

This also doesn't help with ChromeOS. There's no real way to automatically launch something that's bound to localhost (you could probably make this work using Crostini but there's no way to launch a Crostini app at login), and access to hardware-backed keys is kind of a complicated topic in ChromeOS for privacy reasons. I haven't tried this yet, but I think using an enterprise force-installed extension and the chrome.enterprise.platformKeys API to obtain a device identity cert and then intercepting requests to the appropriate port range on localhost ought to be enough to do that? But I've literally never written any Javascript so I don't know. Okta supports falling back from the loopback protocol to calling a custom URI scheme, but once you allow that you're also losing a bunch of the phishing protection, so I'd prefer not to take that approach.

Like I said, none of this prevents exfiltration of bearer tokens once they've been issued, and there's still a lot of ecosystem work to do there. But ensuring that tokens can't be issued to unmanaged machines in the first place is still a step forwards, and with luck we'll be able to make use of this on Linux systems without relying on proprietary client-side tooling.

(Time taken to code this implementation: about two days, and under 1000 lines of new code. Time taken to figure out what the fuck to write: rather a lot longer)

[1] There's also support for having Okta issue certificates, but then you're kind of back to the "How do I know this is my device" situation

comment count unavailable comments

Next.

Previous.